In this notebook, we will process the Gutenbergy index to get a list of all fiction titles and their file location.

Previously, we have downloaded a mirror of the Gutenbery project as set out in this [blogpost](https://roboticape.wordpress.com/2018/06/26/getting-all-the-books/).

Luckily Jon Reeve has kindly prepared a short little Python script to download and parse the RDF index provided by Gutenberg. Check out his [parseRDF.py](https://github.com/JonathanReeve/gitenberg-experiments/blob/master/parseRDF.py) file. (In the code below I have downloaded the RDF bzip file which is in the same directory.)

In [1]:
from parseRDF import readmetadata
md = readmetadata()

In [2]:
md[1]

{'id': 1,
 'author': 'Jefferson, Thomas',
 'title': 'The Declaration of Independence of the United States of America',
 'downloads': 386,
 'formats': {'application/rdf+xml': 'https://www.gutenberg.org/ebooks/1.rdf',
  'application/zip': 'https://www.gutenberg.org/files/1/1-0.zip',
  'application/epub+zip': 'https://www.gutenberg.org/ebooks/1.epub.noimages',
  'application/x-mobipocket-ebook': 'https://www.gutenberg.org/ebooks/1.kindle.images',
  'text/html': 'https://www.gutenberg.org/files/1/1-h/1-h.htm',
  'image/jpeg': 'https://www.gutenberg.org/cache/epub/1/pg1.cover.medium.jpg',
  'text/plain; charset=us-ascii': 'https://www.gutenberg.org/files/1/1-0.txt',
  'text/plain': 'https://www.gutenberg.org/ebooks/1.txt.utf-8'},
 'type': 'Text',
 'LCC': {'E201', 'JK'},
 'subjects': {'United States -- History -- Revolution, 1775-1783 -- Sources',
  'United States. Declaration of Independence'},
 'authoryearofbirth': 1743,
 'authoryearofdeath': 1826,
 'language': ['en']}

In [3]:
print("We have {0} books in the index".format(len(md)))

We have 64116 books in the index


In [4]:
type(md)

dict

In [5]:
# Now we iterate to look for fiction
filtered_md = {}

for _, book in md.items():
    fiction = False
    # Iterate through subjects looking for fiction
    for subject in book['subjects']:
        if "fiction" in subject.lower() and "non" not in subject.lower():
            fiction = True
    # Look for fiction in English
    if fiction and "en" in book['language']:
        filtered_md[book['id']] = book
    
print("We have {0} books in the filtered index".format(len(filtered_md)))

We have 19262 books in the filtered index


In [6]:
for _, book in filtered_md.items():
    print(book['subjects'])

{'Science fiction'}
{'Science fiction', 'Space flight to the moon -- Fiction'}
{'Fiction'}
{'Young women -- Fiction', 'Vampires -- Fiction'}
{'Science fiction'}
{'Man-woman relationships -- Fiction', 'London (England) -- Social life and customs -- 20th century -- Fiction'}
{'City and town life -- Fiction', 'Poor -- Fiction', 'Immigrants -- Fiction', 'Short stories, American', 'Slums -- Fiction'}
{'Schools -- Juvenile fiction'}
{'Orphans -- Fiction', 'Christian life -- Fiction'}
{'Yukon -- Fiction'}
{'Southern States -- Social life and customs -- Fiction'}
{'United States Naval Academy -- Juvenile fiction'}
{'Glasgow (Scotland) -- Fiction', 'Scottish Americans -- Virginia -- Fiction', 'Virginia -- History -- Colonial period, ca. 1600-1775 -- Fiction', 'Jamestown (Va.) -- Fiction', 'Historical fiction'}
{'Inheritance and succession -- Juvenile fiction', 'Girls -- Juvenile fiction'}
{'Death -- Fiction', 'Inheritance and succession -- Fiction'}
{'Death -- Fiction'}
{'Death -- Fiction'}
{'M

{'Stradella, Alessandro, 1639-1682 -- Fiction'}
{'Fables', 'Handkerchiefs -- Fiction'}
{'Horror tales', 'Vampires -- Fiction', 'Short stories'}
{'Fairy tales', 'Flowers -- Juvenile fiction'}
{'Women physicians -- Fiction'}
{'Animals -- Juvenile fiction', 'Girls -- Conduct of life -- Juvenile fiction'}
{'Horror tales', 'Short stories', 'Magic -- Fiction', 'Pennsylvania Dutch -- Fiction'}
{'Conduct of life -- Juvenile literature', 'Birds -- Juvenile fiction', 'Dogs -- Juvenile fiction'}
{'Conduct of life -- Juvenile fiction'}
{'Manners and customs -- Fiction', 'Short stories'}
{'Manners and customs -- Fiction', 'Short stories'}
{'Manners and customs -- Fiction', 'Short stories', 'Mothers and daughters -- Fiction'}
{'Manners and customs -- Fiction', 'North Carolina -- Fiction'}
{'Manners and customs -- Fiction'}
{'Man-woman relationships -- Fiction', 'Manners and customs -- Fiction'}
{'Man-woman relationships -- Fiction'}
{'Science fiction, American'}
{'Science fiction'}
{'Extraterrestria

{'German fiction -- Translations into English'}
{'London (England) -- Fiction', 'Great Britain -- History -- Elizabeth, 1558-1603 -- Fiction'}
{'Kentucky -- Social life and customs -- Fiction'}
{'Family -- Juvenile fiction'}
{'Mexico -- History -- Conquest, 1519-1540 -- Fiction'}
{'Fiction'}
{'Princesses -- Fiction', 'Adventure stories'}
{'Mogul Empire -- Fiction'}
{'Young men -- Fiction', 'Love stories', 'Cricket -- Fiction', 'Boys -- Fiction', 'Schools -- Fiction', 'Golf stories'}
{'Fiction'}
{'Quinine -- Juvenile fiction', 'Friendship -- Juvenile fiction', 'Conduct of life -- Juvenile fiction', 'Voyages and travels -- Juvenile fiction', 'Fathers and sons -- Juvenile fiction', 'Treasure troves -- Juvenile fiction', 'South America -- Juvenile fiction', 'Youth -- Conduct of life -- Juvenile fiction', 'Incas -- Juvenile fiction', 'Adventure and adventurers -- Juvenile fiction', 'Indians of South America -- Juvenile fiction'}
{'Loggers -- Fiction', 'Michigan -- Fiction', 'Frontier and pi

{'Mystery and detective stories', 'Fiction'}
{'Riperdá, Juan Guillermo, Duke of, 1680-1737 -- Fiction', 'Wharton, Philip Wharton, Duke of, 1698-1731 -- Fiction'}
{'Short stories, American', 'Louisiana -- Social life and customs -- Fiction'}
{'San Francisco (Calif.) -- Juvenile fiction', 'Adventure stories'}
{'Aeronautics -- Juvenile fiction', 'Air pilots -- Juvenile fiction', 'Detective and mystery stories', 'Ghost stories', 'Airplanes -- Piloting -- Juvenile fiction'}
{'College stories', 'Yale University -- Fiction'}
{'Bildungsromans', 'Orphans -- Fiction', 'Kidnapping victims -- Fiction', 'Criminals -- Fiction', 'London (England) -- Fiction', 'Boys -- Fiction'}
{'Journalists -- Juvenile fiction', 'Treasure troves -- Juvenile fiction'}
{'Italy -- Fiction'}
{'Short stories, American', 'Seafaring life -- Fiction', 'Sea stories, American'}
{'Riperdá, Juan Guillermo, Duke of, 1680-1737 -- Fiction', 'Wharton, Philip Wharton, Duke of, 1698-1731 -- Fiction'}
{'Judges -- Fiction', 'Chicago (I

{'Scotland -- History -- War of Independence, 1285-1371 -- Fiction'}
{'Short stories', 'Motion pictures -- Production and direction -- Fiction', 'Trojan War -- Fiction', 'Science fiction', 'Human-alien encounters -- Fiction'}
{'Gothic fiction', 'Horror tales', 'Vampires -- Fiction'}
{'Science fiction', 'Symbiosis -- Fiction', 'Short stories', 'Psychological fiction'}
{'New York (N.Y.) -- Fiction', 'Theater -- Fiction'}
{'Science fiction', 'Gangsters -- Fiction', 'Time travel -- Fiction', 'Short stories'}
{'Families -- Juvenile fiction', 'Cousins -- Juvenile fiction', 'Boys -- Societies and clubs -- Juvenile fiction', 'Horses -- Juvenile fiction', 'Schooners -- Juvenile fiction', 'Brothers -- Juvenile fiction', 'Hunting -- Juvenile fiction'}
{'Schools -- Juvenile fiction', 'Football stories', 'Sports stories'}
{'Short stories', 'Gambling -- Fiction', 'Parapsychology -- Fiction', 'Science fiction', 'Telepathy -- Fiction', 'Cardsharping -- Fiction'}
{'England -- Social life and customs --

In [7]:
for _, book in filtered_md.items():
    print(book['title'], book['author'])

The House on the Borderland Hodgson, William Hope
A Voyage to the Moon: With Some Account of the Manners and Customs, Science and Philosophy, of the People of Morosofia, and Other Lunarians Tucker, George
La Fiammetta Boccaccio, Giovanni
Carmilla Le Fanu, Joseph Sheridan
The Mystery White, Stewart Edward
Tenterhooks Leverson, Ada
Gaslight Sonatas Hurst, Fannie
The Triple Alliance, Its Trials and Triumphs Avery, Harold
A Beautiful Possibility Black, Edith Ferguson
The Magnetic North Robins, Elizabeth
The Rivet in Grandfather's Neck: A Comedy of Limitations Cabell, James Branch
Dave Darrin's Second Year at Annapolis: Or, Two Midshipmen as Naval Academy "Youngsters" Hancock, H. Irving (Harrie Irving)
Salute to Adventurers Buchan, John
Billie Bradley and Her Inheritance; Or, The Queer Homestead at Cherry Corners Wheeler, Janet D.
Old Lady Mary: A Story of the Seen and the Unseen Oliphant, Mrs. (Margaret)
A Little Pilgrim: Stories of the Seen and the Unseen Oliphant, Mrs. (Margaret)
The Lit

The Gentle Grafter Henry, O.
The Good Comrade Silberrad, Una L.
Stories of Ships and the Sea London, Jack
Rabbi Saunderson Maclaren, Ian
The Frame Up Davis, Richard Harding
The Boy Trapper Castlemon, Harry
Autumn Nathan, Robert
The Lost House Davis, Richard Harding
A Dozen Ways Of Love Dougall, L. (Lily)
The Log of the "Jolly Polly" Davis, Richard Harding
Samantha at the World's Fair Holley, Marietta
From the Valley of the Missing White, Grace Miller
Bucky O'Connor: A Tale of the Unfenced Border Raine, William MacLeod
Genesis Piper, H. Beam
Graveyard of Dreams Piper, H. Beam
A Second Home Balzac, Honoré de
The Bridal March; One Day Bjørnson, Bjørnstjerne
The Freebooters of the Wilderness Laut, Agnes C.
Massimilla Doni Balzac, Honoré de
A Canadian Heroine, Volume 2: A Novel Coghill, Harry, Mrs.
Tales of the Chesapeake Townsend, George Alfred
A Prince of Bohemia Balzac, Honoré de
A Canadian Heroine, Volume 3: A Novel Coghill, Harry, Mrs.
Little Fuzzy Piper, H. Beam
Rip Foster in Ride the

Wikkey: A Scrap Vaders, Henrietta
Subversive Reynolds, Mack
With No Strings Attached Garrett, Randall
Americans All: Stories of American Life of To-Day None
How Janice Day Won Long, Helen Beecher
Missing Link Herbert, Frank
Uncle Wiggily and Old Mother Hubbard: Adventures of the Rabbit Gentleman with the Mother Goose Characters Garis, Howard Roger
Fostina Woodman, the Wonderful Adventurer Stanwood, Avis A. Burnham
Old Ebenezer Read, Opie Percival
The Roll-Call Of The Reef Quiller-Couch, Arthur
The Story of the Little Mamsell Niese, Charlotte
The Fête At Coqueville: 1907 Zola, Émile
Good Blood Wildenbruch, Ernst von
Rich Enough: a tale of the times Lee, Hannah Farnham Sawyer
The Servant Problem Young, Robert F.
Mistress Anne Bailey, Temple
The Cursed Patois: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Black Feather: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
The Blue Man: From "Mackinac And Lake Stories", 1899 Catherwood, Mary Hartwell
A Ho

The Man the Martians Made Long, Frank Belknap
The Martian Cabal Starzl, Roman Frederick
Dr. Sevier Cable, George Washington
The Great Hunger Bojer, Johan
The Hour of Battle Sheckley, Robert
Beside Still Waters Sheckley, Robert
Perez the Mouse Coloma, Luis
Pariah Planet Leinster, Murray
The Wings of the Dove, Volume 1 of 2 James, Henry
Traffic in Souls: A Novel of Crime and Its Cure Ball, Eustace Hale
Invasion Leinster, Murray
Loot of the Void Sloat, Edwin K.
Cost of Living Sheckley, Robert
The House Under the Sea: A Romance Pemberton, Max
Lords of the Stratosphere Burks, Arthur J.
The Story of Don Quixote Edwards, Clayton
Howards End Forster, E. M. (Edward Morgan)
The Velvet Glove Harrison, Harry
Under Arctic Ice Bates, Harry
The Night Riders: A Romance of Early Montana Cullum, Ridgwell
The Fifth String Sousa, John Philip
The Little Brown Hen Hears the Song of the Nightingale & The Golden Harvest Van Dresser, Jasmine Stone
Faro Nell and Her Friends: Wolfville Stories Lewis, Alfred Henr

The Ranch Girls at Home Again Vandercook, Margaret
The Ranch Girls in Europe Vandercook, Margaret
The Woman of Mystery Leblanc, Maurice
Mystery and Confidence: A Tale. Vol. 1 Pinchard, Elizabeth Sibthorpe
Mystery and Confidence: A Tale. Vol. 3 Pinchard, Elizabeth Sibthorpe
£19,000 Delannoy, Burford
Consequences Delafield, E. M.
The Secret of Sarek Leblanc, Maurice
Among the Meadow People Pierson, Clara Dillingham
Brenda, Her School and Her Club Reed, Helen Leah
The House of Strange Secrets: A Detective Story Bayly, A. Eric
King Spruce, A Novel Day, Holman
Quicksands Streckfuss, Adolf
The Children of Alsace (Les Oberlés) Bazin, René
Khaled, A Tale of Arabia Crawford, F. Marion (Francis Marion)
Boon, The Mind of the Race, The Wild Asses of the Devil, and The Last Trump;: Being a First Selection from the Literary Remains of George Boon, Appropriate to the Times Wells, H. G. (Herbert George)
Mystery and Confidence: A Tale. Vol. 2 Pinchard, Elizabeth Sibthorpe
The Japanese Twins Perkins, Lu

The Boy Scouts at the Panama-Pacific Exposition Goldfrap, John Henry
Memoirs of a Surrey Labourer: A Record of the Last Years of Frederick Bettesworth Sturt, George
Morag: A Tale of the Highlands of Scotland Rae, Janet Milne
The Eve of All-Hallows; Or, Adelaide of Tyrconnel, v. 2 of 3 Hartstonge, Matthew Weld
The King of the Mountains About, Edmond
Frank Before Vicksburg: The Gun-Boat Series Castlemon, Harry
At the Mercy of Tiberius Evans, Augusta J. (Augusta Jane)
Dorothy and the Wizard in Oz Baum, L. Frank (Lyman Frank)
The Bride of the Tomb, and Queenie's Terrible Secret Miller, Alex. McVeigh, Mrs.
Frank on the Prairie Castlemon, Harry
The Boy Scouts' Mountain Camp Goldfrap, John Henry
The Dull Miss Archinard Sedgwick, Anne Douglas
And Then the Town Took Off Wilson, Richard
The First Capture; or, Hauling Down the Flag of England Castlemon, Harry
The Trail-Hunter: A Tale of the Far West Aimard, Gustave
The Pirates of the Prairies: Adventures in the American Desert Aimard, Gustave
The

Mr. Wayt's Wife's Sister Harland, Marion
The Sack of Monte Carlo: An Adventure of To-day Frith, Walter
Gowrie; or, the King's Plot. James, G. P. R. (George Payne Rainsford)
The Morals of Marcus Ordeyne : a Novel Locke, William John
Absalom's Hair Bjørnson, Bjørnstjerne
Motor Matt's Launch; or, A Friend in Need Matthews, Stanley R.
The Dream Doctor Reeve, Arthur B. (Arthur Benjamin)
Buddy Jim Gordon, Elizabeth
Silver Cross Johnston, Mary
The Dark Other Weinbaum, Stanley G. (Stanley Grauman)
Falcons of Narabedla Bradley, Marion Zimmer
The Green Odyssey Farmer, Philip José
A Sub. of the R.N.R.: A Story of the Great War Westerman, Percy F. (Percy Francis)
A Thousand Degrees Below Zero Leinster, Murray
Four in Camp: A Story of Summer Adventures in the New Hampshire Woods Barbour, Ralph Henry
Quinneys' Vachell, Horace Annesley
Flute and Violin, and Other Kentucky Tales and Romances Allen, James Lane
The Dark Frigate Hawes, Charles Boardman
The Boy Scouts at the Canadian Border Goldfrap, John

It Was Marlowe: A Story of the Secret of Three Centuries Zeigler, Wilbur Gleason
A Secret of the Sea: A Novel. Vol. 2 (of 3) Speight, T. W. (Thomas Wilkinson)
A Secret of the Sea: A Novel. Vol. 3 (of 3) Speight, T. W. (Thomas Wilkinson)
Jet Plane Mystery Snell, Roy J. (Roy Judson)
Jimmy Drury: Candid Camera Detective O'Hara, David
The Rāmāyana, Volume 2. Āranya, Kishkindhā, and Sundara Kāndam Valmiki
The Story of Rustem, and other Persian hero tales from Firdusi Renninger, Elizabeth D.
A Tramp Abroad — Volume 01 Twain, Mark
The Common Lot Herrick, Robert
The Hill of Adventure Meigs, Cornelia
Jinny the Carrier Zangwill, Israel
A Tramp Abroad — Volume 02 Twain, Mark
The Adventures of Jimmy Brown Alden, W. L. (William Livingston)
The World's Illusion, Volume 2 (of 2): Ruth Wassermann, Jakob
A Tramp Abroad — Volume 03 Twain, Mark
What Norman Saw in the West Anonymous
A Tramp Abroad — Volume 04 Twain, Mark
Robin Linnet Benson, E. F. (Edward Frederic)
A Tramp Abroad — Volume 05 Twain, Mark
A

Windy McPherson's Son Anderson, Sherwood
The Rising of the Court Lawson, Henry
Darkness and Dawn England, George Allan
The Adventures of Sally Wodehouse, P. G. (Pelham Grenville)
Richard of Jamestown : a Story of the Virginia Colony Otis, James
The Newcomes: Memoirs of a Most Respectable Family Thackeray, William Makepeace
Daniel Deronda Eliot, George
Burning Daylight London, Jack
The Duke of Stockbridge: A Romance of Shays' Rebellion Bellamy, Edward
Lost on the Moon; Or, in Quest of the Field of Diamonds Rockwood, Roy
The Book of Wonder Dunsany, Lord
Toby Tyler; Or, Ten Weeks with a Circus Otis, James
Dorothy Dainty at Glenmore Brooks, Amy
The Created Legend Sologub, Fyodor
The Three Clerks Trollope, Anthony
The Last American: A Fragment from the Journal of Khan-li, Prince of Dimph-yoo-chur and Admiral in the Persian Navy Mitchell, John Ames
The Master of Silence: A Romance Bacheller, Irving
The Brother of Daphne Yates, Dornford
The Fighting Chance Chambers, Robert W. (Robert William)

In [8]:
import pickle, gzip
pickle.dump(filtered_md, gzip.open("Fiction Books", 'wb'), protocol=-1)

In [9]:
import os
files = os.walk(os.path.curdir)

In [10]:
f = list(files)

In [11]:
f[1]

('./2',
 ['2',
  '24',
  '9',
  '29',
  '25',
  '27',
  '22',
  '26',
  '6',
  '23',
  '7',
  '0',
  '28',
  '1',
  '5',
  '3',
  '4',
  '8'],
 [])

In [12]:
f

[('.',
  ['2',
   '9',
   '.ipynb_checkpoints',
   'etext96',
   'etext02',
   '.git',
   'etext00',
   '6',
   '7',
   '0',
   '1',
   '5',
   '3',
   '__pycache__',
   '4',
   '8'],
  ['log_gutenberg',
   'GUTINDEX.zip',
   'gutenberg.code-workspace',
   'Fiction Books',
   'rdf-files.tar.bz2',
   'BookPaths.pkl',
   '2020-05-29 - Bigram Counts.ipynb',
   'Get List of Fiction Titles.ipynb',
   'chapterize.py',
   'md.pickle.gz',
   'Notes.txt',
   'parseRDF.py',
   'Load Book and Send to Spacy.ipynb']),
 ('./2',
  ['2',
   '24',
   '9',
   '29',
   '25',
   '27',
   '22',
   '26',
   '6',
   '23',
   '7',
   '0',
   '28',
   '1',
   '5',
   '3',
   '4',
   '8'],
  []),
 ('./2/2',
  ['2',
   '9',
   '222',
   '228',
   '227',
   '223',
   '226',
   '6',
   '229',
   '7',
   '0',
   '1',
   '220',
   '221',
   '224',
   '5',
   '3',
   '4',
   '8'],
  []),
 ('./2/2/2',
  ['2',
   '2220',
   '9',
   '6',
   '7',
   '0',
   '2223',
   '1',
   '2222',
   '2227',
   '5',
   '3',
   '2224',

How to get a list of the ZIP files:

* Walk spits out a generator of (root, dirs, files) tuples;
* Turn this into a list and look for where ".zip" is in the file and isdigit() is true of the left-split;
* record "root" - "file" tuples for all matches.


In [13]:
filelist = []
for root, dirs, files in os.walk(os.path.curdir):
    for file in files:
        if ".zip" in file:
            bookid = file.split(".")[0]
            if bookid.isdigit():
                filelist.append((os.path.join(root, file), bookid))

In [14]:
filelist[0:20]

[('./2/2/2/2/22225/22225.zip', '22225'),
 ('./2/2/2/2/22222/22222.zip', '22222'),
 ('./2/2/2/2/22226/22226.zip', '22226'),
 ('./2/2/2/2/22223/22223.zip', '22223'),
 ('./2/2/2/2/22220/22220.zip', '22220'),
 ('./2/2/2/2/22229/22229.zip', '22229'),
 ('./2/2/2/2/22224/22224.zip', '22224'),
 ('./2/2/2/2/22228/22228.zip', '22228'),
 ('./2/2/2/2/22227/22227.zip', '22227'),
 ('./2/2/2/2/22221/22221.zip', '22221'),
 ('./2/2/2/2220/2220.zip', '2220'),
 ('./2/2/2/9/22295/22295.zip', '22295'),
 ('./2/2/2/9/22292/22292.zip', '22292'),
 ('./2/2/2/9/22294/22294.zip', '22294'),
 ('./2/2/2/9/22298/22298.zip', '22298'),
 ('./2/2/2/9/22293/22293.zip', '22293'),
 ('./2/2/2/9/22290/22290.zip', '22290'),
 ('./2/2/2/9/22297/22297.zip', '22297'),
 ('./2/2/2/9/22291/22291.zip', '22291'),
 ('./2/2/2/6/22264/22264.zip', '22264')]

In [15]:
import pickle
pickle.dump(filelist, open("BookPaths.pkl", 'wb'))

In [16]:
len(filelist)

38412

In [17]:
for path, bookid in filelist:
    book = filtered_md.get(int(bookid), None)
    if book:
        book['path'] = path

In [18]:
list(filtered_md.items())[0:10]

[(10002,
  {'id': 10002,
   'author': 'Hodgson, William Hope',
   'title': 'The House on the Borderland',
   'downloads': 637,
   'formats': {'application/x-mobipocket-ebook': 'http://www.gutenberg.org/ebooks/10002.kindle.noimages',
    'application/epub+zip': 'http://www.gutenberg.org/ebooks/10002.epub.noimages',
    'image/jpeg': 'http://www.gutenberg.org/cache/epub/10002/pg10002.cover.medium.jpg',
    'text/plain': 'http://www.gutenberg.org/ebooks/10002.txt.utf-8',
    'application/zip': 'http://www.gutenberg.org/files/10002/10002-h.zip',
    'application/rdf+xml': 'http://www.gutenberg.org/ebooks/10002.rdf',
    'text/plain; charset=iso-8859-1': 'http://www.gutenberg.org/files/10002/10002-8.zip',
    'text/plain; charset=us-ascii': 'http://www.gutenberg.org/files/10002/10002.txt',
    'text/html; charset=iso-8859-1': 'http://www.gutenberg.org/files/10002/10002-h/10002-h.htm'},
   'type': 'Text',
   'LCC': {'PR'},
   'subjects': {'Science fiction'},
   'authoryearofbirth': 1877,
   

In [19]:
import pickle, gzip
pickle.dump(filtered_md, gzip.open("fiction_list.gz", 'wb'), protocol=-1)