In this notebook, we will process the Gutenbergy index to get a list of all fiction titles and their file location.

Previously, we have downloaded a mirror of the Gutenbery project as set out in this [blogpost](https://roboticape.wordpress.com/2018/06/26/getting-all-the-books/).

Luckily Jon Reeve has kindly prepared a short little Python script to download and parse the RDF index provided by Gutenberg. Check out his [parseRDF.py](https://github.com/JonathanReeve/gitenberg-experiments/blob/master/parseRDF.py) file. (In the code below I have downloaded the RDF bzip file which is in the same directory.)

In [1]:
from parseRDF import readmetadata
md = readmetadata()

In [2]:
md[1]

{'id': 1,
 'author': 'Jefferson, Thomas',
 'title': 'The Declaration of Independence of the United States of America',
 'downloads': 653,
 'formats': {'application/epub+zip': 'http://www.gutenberg.org/ebooks/1.epub.noimages',
  'application/prs.tex': 'http://www.gutenberg.org/6/5/2/6527/6527-t.zip',
  'text/plain': 'http://www.gutenberg.org/ebooks/1.txt.utf-8',
  'application/rdf+xml': 'http://www.gutenberg.org/ebooks/1.rdf',
  'application/x-mobipocket-ebook': 'http://www.gutenberg.org/ebooks/1.kindle.noimages',
  'text/html': 'http://www.gutenberg.org/ebooks/1.html.images',
  'text/plain; charset=us-ascii': 'http://www.gutenberg.org/files/1/1.zip'},
 'type': 'Text',
 'LCC': {'E201', 'JK'},
 'subjects': {'United States -- History -- Revolution, 1775-1783 -- Sources',
  'United States. Declaration of Independence'},
 'authoryearofbirth': 1743,
 'authoryearofdeath': 1826,
 'language': ['en']}

In [3]:
print("We have {0} books in the index".format(len(md)))

We have 57523 books in the index


In [16]:
type(md)

dict

In [42]:
# Now we iterate to look for fiction
filtered_md = {}

for _, book in md.items():
    fiction = False
    # Iterate through subjects looking for fiction
    for subject in book['subjects']:
        if "fiction" in subject.lower() and "non" not in subject.lower():
            fiction = True
    # Look for fiction in English
    if fiction and "en" in book['language']:
        filtered_md[book['id']] = book
    
print("We have {0} books in the filtered index".format(len(filtered_md)))

We have 16605 books in the filtered index


In [44]:
for _, book in filtered_md.items():
    print(book['subjects'])

{'Science fiction'}
{'Science fiction', 'Space flight to the moon -- Fiction'}
{'Fiction'}
{'Young women -- Fiction', 'Vampires -- Fiction'}
{'Science fiction'}
{'London (England) -- Fiction', 'Man-woman relationships -- Fiction'}
{'City and town life -- Fiction', 'Short stories, American', 'Immigrants -- Fiction', 'Slums -- Fiction', 'Poor -- Fiction'}
{'Schools -- Juvenile fiction'}
{'Orphans -- Fiction', 'Christian life -- Fiction'}
{'Yukon -- Fiction'}
{'Southern States -- Social life and customs -- Fiction'}
{'United States Naval Academy -- Juvenile fiction'}
{'Scottish Americans -- Virginia -- Fiction', 'Virginia -- History -- Colonial period, ca. 1600-1775 -- Fiction', 'Jamestown (Va.) -- Fiction', 'Historical fiction', 'Glasgow (Scotland) -- Fiction'}
{'Inheritance and succession -- Juvenile fiction', 'Girls -- Juvenile fiction'}
{'Inheritance and succession -- Fiction', 'Death -- Fiction'}
{'Death -- Fiction'}
{'Death -- Fiction'}
{'Manners and customs -- Fiction'}
{'World War

{'Hens -- Juvenile fiction'}
{'Old age -- Juvenile fiction', 'African Americans -- Juvenile fiction', 'United States -- Social life and customs -- 19th century -- Juvenile fiction'}
{'Fantasy literature', 'Toys -- Juvenile fiction'}
{'Conduct of life -- Juvenile fiction', 'Animals -- Juvenile fiction', 'Horses -- Juvenile fiction'}
{'California -- Fiction'}
{'Great Britain -- Colonies -- Juvenile fiction -- Periodicals', 'Gift books -- Periodicals', "Children's stories, English -- Periodicals"}
{'Bumblebees -- Juvenile fiction'}
{'Mate selection -- Fiction', 'Letter writing -- Fiction', 'Invalids -- Fiction', 'Man-woman relationships -- Fiction'}
{'Motherless families -- Juvenile fiction', 'Family -- England -- Juvenile fiction', 'Brothers and sisters -- Juvenile fiction'}
{'Animals -- Juvenile fiction'}
{'Science fiction'}
{'Canadian fiction', 'Detective and mystery stories'}
{'United States -- History -- Civil War, 1861-1865 -- Juvenile fiction'}
{'England -- Juvenile fiction', 'Gees

{'Indians of South America -- Juvenile fiction', 'Naturalists -- Juvenile fiction', 'South America -- Juvenile fiction'}
{'Adventure and adventurers -- Juvenile fiction', 'Voyages and travels -- Juvenile fiction', 'Sailors -- Juvenile fiction', 'Seafaring life -- Juvenile fiction', 'Pirates -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Africa, West -- Juvenile fiction', 'Slaves -- Juvenile fiction'}
{'Adventure stories', 'Hunters -- Fiction', 'Great Plains -- Fiction', 'Hunting stories', 'Wolves -- Fiction', 'Bears -- Fiction'}
{'Detective and mystery stories', 'Private investigators -- England -- Fiction', 'Holmes, Sherlock (Fictitious character) -- Fiction'}
{'Fantasy fiction'}
{'Spain -- Fiction'}
{"Children's stories", 'Conduct of life -- Juvenile fiction', 'Children -- Conduct of life -- Juvenile fiction'}
{'Spy stories', 'Boys -- Juvenile fiction'}
{'Africa -- Juvenile fiction'}
{'Indians of North America -- Juvenile fiction', 'Seafaring life -- Juvenile fiction',

{'Young women -- Fiction', 'Mate selection -- Fiction', 'Italians -- America -- Fiction', 'Love stories'}
{'Science fiction'}
{'Older people -- Fiction', 'Love stories'}
{'Science fiction', 'Short stories'}
{'Seafaring life -- Fiction', 'Great Britain. Royal Navy -- Fiction', 'Smugglers -- Fiction', 'Sea stories', 'Pirates -- Fiction'}
{'Science fiction'}
{'Voyages and travels -- Juvenile fiction', 'Great-uncles -- Juvenile fiction', 'Obedience -- Juvenile fiction', 'Brothers and sisters -- Juvenile fiction', 'Farmers -- Juvenile fiction', 'Letters -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Runaway children -- Juvenile fiction', 'Selfishness -- Juvenile fiction', 'Intergenerational relations -- Juvenile fiction', 'Children -- Conduct of life -- Juvenile fiction'}
{'Americans -- Italy -- Juvenile fiction', 'Nephews -- Juvenile fiction', 'Uncles -- Juvenile fiction'}
{'Science fiction'}
{'Science fiction'}
{'United States -- Social life and customs -- 19th century -- F

{'Sussex (England) -- Fiction'}
{'Romanies -- England -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Snobs and snobbishness -- Juvenile fiction', 'Social classes -- England -- Juvenile fiction'}
{'Great Britain -- History -- George III, 1760-1820 -- Fiction', 'Mystery fiction'}
{'Soldiers -- India -- Juvenile fiction', 'India -- Juvenile fiction', 'East India Company. Army -- Juvenile fiction', 'India -- Kings and rulers -- Juvenile fiction'}
{'China -- History -- Boxer Rebellion, 1899-1901 -- Juvenile fiction', 'Adventure and adventurers -- Juvenile fiction', 'Prisoners -- Juvenile fiction', 'Youth and death -- Juvenile fiction', 'Soldiers -- Juvenile fiction', 'Uncles -- Juvenile fiction', 'Fathers and sons -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Youth -- Conduct of life -- Juvenile fiction'}
{'West (U.S.) -- Description and travel -- Juvenile fiction', 'Hunters -- Juvenile fiction', 'Adventure and adventurers -- Juvenile fiction', 'Survival -- Ju

{'Adventure stories', 'Supernatural -- Fiction', 'Short stories, English', 'Mystery fiction'}
{'Zines', 'Science fiction -- Periodicals'}
{'Zines', 'Science fiction -- Periodicals'}
{'Zines', 'Science fiction -- Periodicals'}
{'World War, 1914-1918 -- Juvenile fiction', 'United States. Navy -- Juvenile fiction', 'Submarines (Ships) -- Germany -- Juvenile fiction'}
{'Girls -- Juvenile fiction', 'Vacations -- Juvenile fiction'}
{'Catholics -- Fiction', 'Inheritance and succession -- Fiction', 'Great Britain -- History -- 19th century -- Fiction', 'Country life -- England -- Fiction', 'Mistaken identity -- Fiction'}
{'Science fiction'}
{'Voyages and travels -- Juvenile fiction', 'Queens -- Juvenile fiction', 'Cousins -- Juvenile fiction', 'Seafaring life -- Juvenile fiction', 'Sick -- Juvenile fiction', 'Prayer -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Youth -- Conduct of life -- Juvenile fiction', 'Christian life -- Juvenile fiction'}
{'France -- Fiction', 'French fic

{'Chivalry -- Fiction', 'Romances, Spanish -- Translations into English', 'Knights and knighthood -- Fiction'}
{'Ireland -- Fiction', 'Fantasy fiction, English'}
{'Science fiction', 'Short stories', 'Time travel -- Fiction'}
{'Human-alien encounters -- Fiction', 'Space colonies -- Fiction', 'Extraterrestrial beings -- Fiction', 'Science fiction'}
{'French fiction -- Translations into English', 'Short stories, French -- Translations into English'}
{'Animals -- Juvenile fiction', 'Muskrat -- Juvenile fiction'}
{'Science fiction', 'Short stories', 'Love -- Fiction', 'Chance -- Fiction'}
{'Man-woman relationships -- Fiction', 'Science fiction', 'Post-apocalyptic fiction', 'Science fiction -- Authorship -- Fiction', 'Time travel -- Fiction'}
{'Boys -- Fiction', 'Diary fiction'}
{'Science fiction', 'Short stories'}
{'Science fiction'}
{'Space colonies -- Fiction', 'Mars (Planet) -- Fiction', 'Science fiction'}
{'Gods -- Fiction', 'Space colonies -- Fiction', 'Princesses -- Fiction', 'Extraso

{'Adventure stories', 'Tarzan (Fictitious character) -- Fiction', 'Fantasy fiction'}
{'Science fiction', 'Fantasy fiction'}
{'Science fiction', 'Earth (Planet) -- Core -- Fiction', 'Adventure stories'}
{'Adventure stories', 'Apes -- Fiction', 'Jungles -- Fiction', 'Fantasy fiction', 'Tarzan (Fictitious character) -- Fiction', 'Africa -- Fiction'}
{'Science fiction', 'England -- Fiction', 'Fantasy fiction'}
{'Adventure stories', 'Fantasy fiction', 'Kings and rulers -- Fiction'}
{'Historical fiction', 'Great Britain -- History -- Henry III, 1216-1272 -- Fiction', 'Outlaws -- Fiction'}
{'Science fiction', 'Dinosaurs -- Fiction', 'Prehistoric peoples -- Fiction', 'Lost continents -- Fiction'}
{'Science fiction', 'Lost continents -- Fiction'}
{'Science fiction', 'Earth (Planet) -- Core -- Fiction', 'Adventure stories'}
{'Science fiction', 'Mars (Planet) -- Fiction'}
{'Adventure stories', 'Tarzan (Fictitious character) -- Fiction', 'Fantasy fiction'}
{'Adventure stories', 'Tarzan (Fictitious

In [45]:
for _, book in filtered_md.items():
    print(book['title'], book['author'])

The House on the Borderland Hodgson, William Hope
A Voyage to the Moon: With Some Account of the Manners and Customs, Science and Philosophy, of the People of Morosofia, and Other Lunarians Tucker, George
La Fiammetta Boccaccio, Giovanni
Carmilla Le Fanu, Joseph Sheridan
The Mystery White, Stewart Edward
Tenterhooks Leverson, Ada
Gaslight Sonatas Hurst, Fannie
The Triple Alliance, Its Trials and Triumphs Avery, Harold
A Beautiful Possibility Black, Edith Ferguson
The Magnetic North Robins, Elizabeth
The Rivet in Grandfather's Neck: A Comedy of Limitations Cabell, James Branch
Dave Darrin's Second Year at Annapolis: Or, Two Midshipmen as Naval Academy "Youngsters" Hancock, H. Irving (Harrie Irving)
Salute to Adventurers Buchan, John
Billie Bradley and Her Inheritance; Or, The Queer Homestead at Cherry Corners Wheeler, Janet D.
Old Lady Mary: A Story of the Seen and the Unseen Oliphant, Mrs. (Margaret)
A Little Pilgrim: Stories of the Seen and the Unseen Oliphant, Mrs. (Margaret)
The Lit

The Commission in Lunacy Balzac, Honoré de
Kernel Cob And Little Miss Sweetclover Mitchell, George
Legend of Moulin Huet Freeth, Lizzie A.
The White Riband; Or, A Young Female's Folly Jesse, F. Tennyson (Fryniwyd Tennyson)
Domestic Peace Balzac, Honoré de
The Marriage of William Ashe Ward, Humphry, Mrs.
Toni, the Little Woodcarver Spyri, Johanna
Masterman Ready Marryat, Frederick
The Outdoor Chums on the Gulf; Or, Rescuing the Lost Balloonists Allen, Quincy
David Balfour: Being Memoirs Of His Adventures At Home And Abroad, The Second Part: In Which Are Set Forth His Misfortunes Anent The Appin Murder; His Troubles With Lord Advocate Grant; Captivity On The Bass Rock; Journey Into Holland And France; And Singular Relations With James More Drummond Or Macgregor, A Son Of The Notorious Rob Roy, And His Daughter Catriona Stevenson, Robert Louis
The Outdoor Girls at the Hostess House; Or, Doing Their Best for the Soldiers Hope, Laura Lee
Tom Tiddler's Ground Dickens, Charles
If Winter Comes

The Parts Men Play Baxter, Beverley
Other People's Money Gaboriau, Emile
Six Little Bunkers at Cousin Tom's Hope, Laura Lee
The Stolen Singer Bellinger, Martha Idell Fletcher
Elsie at Home Finley, Martha
Ole Mammy's Torment Johnston, Annie F. (Annie Fellows)
When Knighthood Was in Flower: or, the Love Story of Charles Brandon and Mary Tudor the King's Sister, and Happening in the Reign of His August Majesty King Henry the Eighth Major, Charles
Cousin Betty Balzac, Honoré de
The Picture of Dorian Gray Wilde, Oscar
The Return of the Native Hardy, Thomas
A Little Mother to the Others Meade, L. T.
Everybody's Lonesome: A True Fairy Story Laughlin, Clara E. (Clara Elizabeth)
El Dorado: An Adventure of the Scarlet Pimpernel Orczy, Emmuska Orczy, Baroness
Maida's Little Shop Gillmore, Inez Haynes
Two Knapsacks: A Novel of Canadian Summer Life Campbell, John
The Jester of St. Timothy's Pier, Arthur Stanwood
Princess McClelland, M. G. (Mary Greenway)
The Lion of Saint Mark: A Story of Venice in

Torchy Ford, Sewell
Torchy, Private Sec. Ford, Sewell
Torchy and Vee Ford, Sewell
Torchy As A Pa Ford, Sewell
Molly Brown's Orchard Home Speed, Nell
The Trail of the White Mule Bower, B. M.
Through Three Campaigns: A Story of Chitral, Tirah and Ashanti Henty, G. A. (George Alfred)
The Nabob, Vol. 1 (of 2) Daudet, Alphonse
Oomphel in the Sky Piper, H. Beam
Left End Edwards Barbour, Ralph Henry
A Jolly Fellowship Stockton, Frank Richard
Ministry of Disturbance Piper, H. Beam
Dick Hamilton's Airship; Or, A Young Millionaire in the Clouds Garis, Howard Roger
The Romance of an Old Fool Field, Roswell Martin
Wildfire Grey, Zane
A Christmas Carol Dickens, Charles
The Triumphs of Eugene Valmont Barr, Robert
The Tory Maid Stimpson, Herbert Baird
Northanger Abbey Austen, Jane
Pride and Prejudice Austen, Jane
Pride and Prejudice Austen, Jane
Ragged Dick Alger, Horatio, Jr.
Keziah Coffin Lincoln, Joseph Crosby
Sarrasine Balzac, Honoré de
The Jungle Baby Farrow, G. E. (George Edward)
The Spirit of 

Aucassin and Nicolette: translated from the Old French None
Rich Enough: a tale of the times Lee, Hannah Farnham Sawyer
The Servant Problem Young, Robert F.
Mistress Anne Bailey, Temple
The Cursed Patois: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Black Feather: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Blue Man: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
A House to Let Procter, Adelaide Anne
The Skeleton On Round Island: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
Marianson: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Indian On The Trail: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Mothers Of Honoré: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Cobbler In The Devil's Kitchen: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
A British Islander: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Har

My Lady Ludlow Gaskell, Elizabeth Cleghorn
John Ingerfield, and Other Stories Jerome, Jerome K. (Jerome Klapka)
Hunters Out of Space Kelleam, Joseph Everidge
The Varmint Johnson, Owen
Rollo at Work Abbott, Jacob
The Sorrows of Young Werther Goethe, Johann Wolfgang von
Explorers of the Dawn De la Roche, Mazo
Pharaoh's Broker: Being the Very Remarkable Experiences in Another World of Isidor Werner Douglass, Ellsworth
Wood Magic: A Fable Jefferies, Richard
The Adventures of Danny Meadow Mouse Burgess, Thornton W. (Thornton Waldo)
Jack: 1877 Daudet, Alphonse
The Bad Boy at Home, and His Experiences in Trying to Become an Editor: 1885 Victor, Metta Victoria Fuller
Memoirs Of Fanny Hill: A New and Genuine Edition from the Original Text (London, 1749) Cleland, John
Drolls From Shadowland Pearce, J. H. (Joseph Henry)
The Half-Brothers Gaskell, Elizabeth Cleghorn
The Great K. & A. Robbery Ford, Paul Leicester
Deerfoot in The Mountains Ellis, Edward Sylvester
Round the Sofa Gaskell, Elizabeth Cl

The Cryptogram: A Novel De Mille, James
It's a Small Solar System Howard, Allan
The Helpful Robots Shea, Robert
The Comings of Cousin Ann Sampson, Emma Speed
The Dark Star Chambers, Robert W. (Robert William)
A Modern Cinderella Douglas, Amanda M.
Ned, Bob and Jerry on the Firing Line; Or, The Motor Boys Fighting for Uncle Sam Young, Clarence
The Readjustment Irwin, Will
Turn About Eleanor Kelley, Ethel M. (Ethel May)
Uncle Terry: A Story of the Maine Coast Munn, Charles Clark
The Campfire Girls of Roselawn; Or, a Strange Message from the Air Penrose, Margaret
The Motor Boat Club and The Wireless; Or, the Dot, Dash and Dare Cruise Hancock, H. Irving (Harrie Irving)
The Fatal Boots Thackeray, William Makepeace
Stopover Gerken, William
I Like Martian Music Fritch, Charles E.
Flight Through Tomorrow Coblentz, Stanton A. (Stanton Arthur)
Heart of the Blue Ridge Baily, Waldron
In Brief Authority Anstey, F.
Sir Nigel Doyle, Arthur Conan
Bolden's Pets Wallace, F. L. (Floyd L.)
The Combined Ma

Sketches New and Old Twain, Mark
Tom Burke Of "Ours", Volume I Lever, Charles James
Tom Burke Of "Ours", Volume II Lever, Charles James
Doctor Bolus and His Patients Unknown
1601: Conversation as it was by the Social Fireside in the Time of the Tudors Twain, Mark
The Pictures; The Betrothing: Novels Tieck, Ludwig
An Artist in Crime Ottolengui, Rodrigues
The Works of Robert Louis Stevenson - Swanston Edition, Vol. 10 Stevenson, Robert Louis
Goldsmith's Friend Abroad Again Twain, Mark
Felony Causey, James
Eden: An Episode Saltus, Edgar
The Portal of Dreams Buck, Charles Neville
Big Pill Gallun, Raymond Z.
The Song of the Wolf Mayer, Frank
Contamination Crew Nourse, Alan Edward
Evil Out of Onzar Ganes, Mark
A Christian But a Roman Jókai, Mór
Leonore Stubbs Walford, Lucy Bethia
St. Peter's Umbrella: A Novel Mikszáth, Kálmán
Thompson's Cat Williams, Robert Moore
Garth and the Visitor Stecher, L. J., Jr.
Joy Ride Meadows, Mark
The Ethical Way Farrell, Joseph
Brknk's Bounty Sohl, Jerry
Feline

The Japanese Twins Perkins, Lucy Fitch
Pierre; or The Ambiguities Melville, Herman
Among the Forest People Pierson, Clara Dillingham
Whispering Walls Wirt, Mildred A. (Mildred Augustine)
Burning Sands Weigall, Arthur E. P. Brome (Arthur Edward Pearse Brome)
The Swiss Twins Perkins, Lucy Fitch
The Adventures of a Freshman Williams, Jesse Lynch
The Blacksmith's Hammer; or, The Peasant Code: A Tale of the Grand Monarch Sue, Eugène
The Professor's Mystery Hastings, Wells
Too Rich: A Romance Streckfuss, Adolf
The Delafield Affair Kelly, Florence Finch
The Count of Nideck: adapted from the French of Erckmann-Chartrian Fiske, Ralph Browning
Jo's Boys Alcott, Louisa May
The Harvester Stratton-Porter, Gene
Among the Pond People Pierson, Clara Dillingham
The House in the Mist Green, Anna Katharine
Abbé Aubain and Mosaics Mérimée, Prosper
Vineta, the Phantom City Werner, E.
Concerning Belinda Brainerd, Eleanor Hoyt
A Search For A Secret: A Novel. Vol. 1 Henty, G. A. (George Alfred)
Among the Nigh

The Honour of the Clintons Marshall, Archibald
Love Among the Lions: A Matrimonial Experience Anstey, F.
A Walk and a Drive. Miller, Thomas
The Affair at the Semiramis Hotel Mason, A. E. W. (Alfred Edward Woodley)
The Courtship of Morrice Buckler: A Romance Mason, A. E. W. (Alfred Edward Woodley)
For Jacinta Bindloss, Harold
Hoof and Claw Roberts, Charles G. D., Sir
Miranda of the Balcony: A Story Mason, A. E. W. (Alfred Edward Woodley)
Parson Kelly Mason, A. E. W. (Alfred Edward Woodley)
The Truants Mason, A. E. W. (Alfred Edward Woodley)
The Turnstile Mason, A. E. W. (Alfred Edward Woodley)
The Watchers: A Novel Mason, A. E. W. (Alfred Edward Woodley)
Peter Binney: A Novel Marshall, Archibald
The Maker of Opportunities Gibbs, George
The Black Moth: A Romance of the XVIIIth Century Heyer, Georgette
Istar of Babylon: A Phantasy Potter, Margaret Horton
Carry On! A Story of the Fight for Bagdad Strang, Herbert
A Singular Metamorphosis Skiles, May Evelyn
Lawrence Clavering Mason, A. E. W.

Yonder Young, E. H. (Emily Hilda)
The Radio Detectives in the Jungle Verrill, A. Hyatt (Alpheus Hyatt)
The Camp Fire Girls on a Yacht Sanderson, Margaret Love
By the Barrow River, and Other Stories Leamy, Edmund
Girls of the True Blue Meade, L. T.
Mason of Bar X Ranch Bennett, Henry Holcomb
The Cornish Fishermen's Watch-Night, and Other Stories Anonymous
The Radio Detectives Under the Sea Verrill, A. Hyatt (Alpheus Hyatt)
The Adopted Daughter: A Tale for Young Persons Sandham, Elizabeth
Uncle Wiggily in Wonderland Garis, Howard Roger
Washer the Raccoon Walsh, George E.
The Deacon: An Original Comedy Drama in Five Acts Dale, Horace C.
A Little Girl in Old San Francisco Douglas, Amanda M.
The Last Call: A Romance (Vol. 1 of 3) Dowling, Richard
The Last Call: A Romance (Vol. 2 of 3) Dowling, Richard
The Last Call: A Romance (Vol. 3 of 3) Dowling, Richard
The Duke's Sweetheart: A Romance Dowling, Richard
Under St Paul's: A Romance Dowling, Richard
The Lady of Lynn Besant, Walter
Buckskin M

Outpost Austin, Jane G. (Jane Goodwin)
The Mission of Poubalov Burton, Frederick R. (Frederick Russell)
Our World; Or, the Slaveholder's Daughter Adams, F. Colburn (Francis Colburn)
The Vintage: A Romance of the Greek War of Independence Benson, E. F. (Edward Frederic)
Grapes of wrath Cable, Boyd
Clown, the Circus Dog Vimar, A. (Auguste)
The Pioneer Boys of the Ohio; or, Clearing the Wilderness Rathborne, St. George
The Pioneer Boys on the Great Lakes; or, On the Trail of the Iroquois Rathborne, St. George
The Pioneer Boys of the Mississippi; or, The Homestead in the Wilderness Rathborne, St. George
The Pioneer Boys of the Missouri; or, In the Country of the Sioux Rathborne, St. George
The Pioneer Boys of the Yellowstone; or, Lost in the Land of Wonders Rathborne, St. George
The Pioneer Boys of the Columbia; or, In the Wilderness of the Great Northwest Rathborne, St. George
The Princess of Cleves La Fayette, Madame de (Marie-Madeleine Pioche de La Vergne)
The Hardy Country: Literary la

Margret Howth: A Story of To-day Davis, Rebecca Harding
The Border Boys Along the St. Lawrence Goldfrap, John Henry
All the People Lafferty, R. A.
Jamieson Doede, William R.
Mystery of the Chinese Ring: A Biff Brewster Mystery Adventure Adams, Andy
A Fall of Glass Lee, Stanley R.
Solid Solution Stamers, James
The Silent Call Royle, Edwin Milton
A Matter of Protocol Sharkey, Jack
Sales Talk Blomberg, Con
The Sorceress; v. 1 of 3 Oliphant, Mrs. (Margaret)
The Treasure Lagerlöf, Selma
Extracts from the Galactick Almanack: Music Around the Universe Janifer, Laurence M.
Always a Qurono Harmon, Jim
Delaware; or, The Ruined Family. Vol. 1 James, G. P. R. (George Payne Rainsford)
Delaware; or, The Ruined Family. Vol. 2 James, G. P. R. (George Payne Rainsford)
Agatha Webb Green, Anna Katharine
Delaware; or, The Ruined Family. Vol. 3 James, G. P. R. (George Payne Rainsford)
Guy Garrick Reeve, Arthur B. (Arthur Benjamin)
Lucinda Hope, Anthony
The Vicissitudes of Evangeline Glyn, Elinor
The Myster

The Pomp of the Lavilettes, Complete Parker, Gilbert
At the Sign of the Eagle Parker, Gilbert
The Trespasser, Volume 1 Parker, Gilbert
The Trespasser, Volume 2 Parker, Gilbert
The Trespasser, Volume 3 Parker, Gilbert
The Trespasser, Complete Parker, Gilbert
The March of the White Guard Parker, Gilbert
The Seats of the Mighty, Volume 1 Parker, Gilbert
The Seats of the Mighty, Volume 2 Parker, Gilbert
The Seats of the Mighty, Volume 3 Parker, Gilbert
The Seats of the Mighty, Volume 4 Parker, Gilbert
The Seats of the Mighty, Volume 5 Parker, Gilbert
The Seats of the Mighty, Complete Parker, Gilbert
The Battle of the Strong: A Romance of Two Kingdoms — Volume 1 Parker, Gilbert
The Battle of the Strong: A Romance of Two Kingdoms — Volume 2 Parker, Gilbert
The Battle of the Strong: A Romance of Two Kingdoms — Volume 3 Parker, Gilbert
The Battle of the Strong: A Romance of Two Kingdoms — Volume 4 Parker, Gilbert
The Battle of the Strong: A Romance of Two Kingdoms — Volume 5 Parker, Gilbert
Th

The Village Uncle (From "Twice Told Tales") Hawthorne, Nathaniel
The Sister Years (From "Twice Told Tales") Hawthorne, Nathaniel
Snow Flakes (From "Twice Told Tales") Hawthorne, Nathaniel
The Seven Vagabonds (From "Twice Told Tales") Hawthorne, Nathaniel
The White Old Maid (From "Twice Told Tales") Hawthorne, Nathaniel
Chippings with a Chisel (From "Twice Told Tales") Hawthorne, Nathaniel
Beneath an Umbrella (From "Twice Told Tales") Hawthorne, Nathaniel
The Lily's Quest (From "Twice Told Tales") Hawthorne, Nathaniel
Footprints on the Sea-Shore (From "Twice Told Tales") Hawthorne, Nathaniel
Edward Fane's Rosebud (From "Twice Told Tales") Hawthorne, Nathaniel
The Threefold Destiny (From "Twice Told Tales") Hawthorne, Nathaniel
The Old Manse (From "Mosses from an Old Manse") Hawthorne, Nathaniel
Fire Worship (From "Mosses from an Old Manse") Hawthorne, Nathaniel
Buds and Bird Voices (From "Mosses from an Old Manse") Hawthorne, Nathaniel
Monsieur du Miroir (From "Mosses from an Old Manse"

In [24]:
import pickle, gzip
pickle.dump(filtered_md, gzip.open("Fiction Books", 'wb'), protocol=-1)

In [26]:
import os
files = os.walk(os.path.curdir)

In [28]:
f = list(files)

In [29]:
f[1]

('./etext04',
 [],
 ['ajtl10h.zip',
  'cfrz10h.zip',
  'pgjr10.zip',
  'cfrz10.zip',
  'pgjr10h.zip',
  'ajtl10.zip',
  'esbio10.zip'])

In [30]:
f

[('.',
  ['etext04',
   'etext00',
   '__pycache__',
   'etext01',
   '7',
   'etext97',
   '.vscode',
   'etext96',
   'etext02',
   '8',
   'etext03',
   '.ipynb_checkpoints',
   '3',
   '2',
   '5',
   '9',
   '6',
   '4',
   '0',
   'etext98',
   '1'],
  ['GUTINDEX.ALL',
   'Get List of Fiction Titles.ipynb',
   'Notes.txt',
   'rdf-files.tar.bz2',
   'GUTINDEX.zip',
   'log_gutenberg',
   'md.pickle.gz',
   'log_filtered_gutenberg',
   'Fiction Books',
   'gutenberg.code-workspace',
   'parseRDF.py']),
 ('./etext04',
  [],
  ['ajtl10h.zip',
   'cfrz10h.zip',
   'pgjr10.zip',
   'cfrz10.zip',
   'pgjr10h.zip',
   'ajtl10.zip',
   'esbio10.zip']),
 ('./etext00',
  [],
  ['mklmt10.zip', 'bgopr10.zip', 'utrkj10.zip', 'wldsp10.zip', 'balen10.zip']),
 ('./__pycache__', [], ['parseRDF.cpython-36.pyc']),
 ('./etext01',
  [],
  ['qltfk10.zip',
   'alpsn10.zip',
   'crsnk10.zip',
   'rmlav10.zip',
   'rmlav10h.zip',
   'pntvw10.zip',
   'penbr10.zip']),
 ('./7',
  ['72',
   '73',
   '75',
 

How to get a list of the ZIP files:

* Walk spits out a generator of (root, dirs, files) tuples;
* Turn this into a list and look for where ".zip" is in the file and isdigit() is true of the left-split;
* record "root" - "file" tuples for all matches.


In [37]:
filelist = []
for root, dirs, files in os.walk(os.path.curdir):
    for file in files:
        if ".zip" in file:
            bookid = file.split(".")[0]
            if bookid.isdigit():
                filelist.append((os.path.join(root, file), bookid))

In [38]:
filelist[0:20]

[('./7/72/72.zip', '72'),
 ('./7/73/73.zip', '73'),
 ('./7/75/75.zip', '75'),
 ('./7/77/77.zip', '77'),
 ('./7/78/78.zip', '78'),
 ('./7/7/779/779.zip', '779'),
 ('./7/7/777/777.zip', '777'),
 ('./7/7/776/776.zip', '776'),
 ('./7/7/7/7774/7774.zip', '7774'),
 ('./7/7/7/7777/7777.zip', '7777'),
 ('./7/7/7/7779/7779.zip', '7779'),
 ('./7/7/7/7773/7773.zip', '7773'),
 ('./7/7/775/775.zip', '775'),
 ('./7/7/8/7784/7784.zip', '7784'),
 ('./7/7/8/7787/7787.zip', '7787'),
 ('./7/7/8/7789/7789.zip', '7789'),
 ('./7/7/8/7788/7788.zip', '7788'),
 ('./7/7/8/7783/7783.zip', '7783'),
 ('./7/7/8/7786/7786.zip', '7786'),
 ('./7/7/8/7782/7782.zip', '7782')]

In [39]:
import pickle
pickle.dump(filelist, open("BookPaths.pkl", 'wb'))

In [40]:
len(filelist)

37930

In [46]:
for path, bookid in filelist:
    book = filtered_md.get(int(bookid), None)
    if book:
        book['path'] = path

In [49]:
list(filtered_md.items())[0:10]

[(10002,
  {'id': 10002,
   'author': 'Hodgson, William Hope',
   'title': 'The House on the Borderland',
   'downloads': 551,
   'formats': {'text/html; charset=iso-8859-1': 'http://www.gutenberg.org/files/10002/10002-h.zip',
    'application/x-mobipocket-ebook': 'http://www.gutenberg.org/ebooks/10002.kindle.noimages',
    'text/plain; charset=iso-8859-1': 'http://www.gutenberg.org/files/10002/10002-8.zip',
    'application/zip': 'http://www.gutenberg.org/files/10002/10002.zip',
    'application/epub+zip': 'http://www.gutenberg.org/ebooks/10002.epub.images',
    'text/plain; charset=us-ascii': 'http://www.gutenberg.org/files/10002/10002.txt',
    'application/rdf+xml': 'http://www.gutenberg.org/ebooks/10002.rdf',
    'text/plain': 'http://www.gutenberg.org/ebooks/10002.txt.utf-8'},
   'type': 'Text',
   'LCC': {'PR'},
   'subjects': {'Science fiction'},
   'authoryearofbirth': 1877,
   'authoryearofdeath': 1918,
   'language': ['en'],
   'path': './1/0/0/0/10002/10002.zip'}),
 (10005,

In [51]:
import pickle, gzip
pickle.dump(filtered_md, gzip.open("fiction_list.gz", 'wb'), protocol=-1)