## Topic Modelling Newspapers

We are going to use this notebook to topic model our newspaper corpus. We start by setting up our imports.

In [1]:
import gensim



In [2]:
import spacy
import os

In [3]:
import xml.etree.ElementTree as ET

### Accessing xml files

In the following lines of code, we are going to assemble the important information from the xml files. The following lines of code iterates through every XML file and accesses it. But we only add it to our corpus if it is censorship related. The following method identifies this.

In [4]:
# corpus = {}

In [5]:
def is_censorship(text):
    # finds out whether censorship or not; for each additional word we wish to add to constrain our group,
    # we add 'or "word" in text' before the final colon in the immediate line below
    if "censor" in text or "censorship" in text:
        return True
    else:
        return False

In [6]:
is_censorship("censor censorship suppress ban") # Here we can display individual texts in order to see if they fit within our 'is_censorship' group as defined above

True

True

In [7]:
2 + 3

5

5

In [8]:
texts = [] # We are creating a list called "text"

In [9]:
# i = 0 # Here we include the entire NYT database in our corpus
# files = {}
# for folder in os.listdir("NYT"):
#     for filename in os.listdir("NYT/" + folder):
#         if filename.endswith(".xml"):
#             tree = ET.parse("NYT/" + folder + "/" +filename)
#             root = tree.getroot()
#             try:
#                 if is_censorship(root[-1].text):
#                     files[filename] = []
#                     files[filename].append(root[-1].text)
#                     files[filename].append(root[3].text)
#                     files[filename].append(root[4].text)
#                     # add it to corpus
#             except IndexError:
#                 continue

In [10]:
i = 0 # Here we selectively add folders to our corpus
files = {}
folders = ["NYT/sm_55428_1120/","NYT/sm_55428_1021/","NYT/sm_55428_1022/","NYT/sm_55428_1023/","NYT/sm_55428_1024/","NYT/sm_55428_1025/","NYT/sm_55428_1026/","NYT/sm_55428_1027/","NYT/sm_55428_1028/","NYT/sm_55428_1029/","NYT/sm_55428_1030/","NYT/sm_55428_1031/","NYT/sm_55428_1032/","NYT/sm_55428_1033/","NYT/sm_55428_1034/","NYT/sm_55428_1035/"]
for folder in folders:
    for filename in os.listdir(folder):
        if filename.endswith(".xml"):
            tree = ET.parse(folder +filename)
            root = tree.getroot()
            try:
                if is_censorship(root[-1].text):
                    files[filename] = []
                    files[filename].append(root[-1].text)
                    files[filename].append(root[3].text)
                    files[filename].append(root[4].text)
                # add it to corpus; '-1' signifies the last element in the .xml file, which is <FullText>
            except IndexError:
                continue
#             i += 1 # for this version we only run 10000 iterations and break after th
#         if i == 10004:
#             break
#     if i == 10004:
#         break    

In [11]:
# i = 0 # Here we selectively add folders to our corpus
# folders = ["NYT/sm_55428_1004/","NYT/sm_55428_1005/"]
# for folder in folders:
#     for filename in os.listdir(folder):
#         if filename.endswith(".txt"):
#             files[filename] = []
#             try:
#                 if is_censorship(filename):
#                     file = open(filename, "r") 
#                     files[filename].append(file.read())
#                 # add it to corpus; '-1' signifies the last element in the .xml file, which is <FullText>
#             except IndexError:
#                 continue
#     i += 1 # for this version we only run 10000 iterations and break after th
#     if i == 10004:
#         break

In [12]:
# os.listdir("NYT/sm_55428_1004/") # to view lists

In [13]:
for file in files: # Here we can see the list of articles that are related to censorship and will be topic modeled
    print(file + "\t" + files[file][1] + "\t" + files[file][2])

sm_55428_1120-10057.xml	Feb 27, 1921	19210227sm_55428_1120-10057.xml	Feb 27, 1921	19210227

sm_55428_1120-10065.xml	British Observers Find Spartanburg Men Lax in Trench Duty.	Nov 26, 1917sm_55428_1120-10065.xml	British Observers Find Spartanburg Men Lax in Trench Duty.	Nov 26, 1917

sm_55428_1120-10067.xml	IMPOSSIBLE APOLLO Latest Works of Fiction THE SHIELD OF SILENCE	Jun 5, 1921sm_55428_1120-10067.xml	IMPOSSIBLE APOLLO Latest Works of Fiction THE SHIELD OF SILENCE	Jun 5, 1921

sm_55428_1120-1021.xml	Theosophist In India Declines Concession After Being Expelled.	Jul 12, 1917sm_55428_1120-1021.xml	Theosophist In India Declines Concession After Being Expelled.	Jul 12, 1917

sm_55428_1120-10250.xml	Many Killed as Result of Clashes in Barcelona and Sabadell. ARREST SOCIALIST LEADERS Accused of Spreading Sedition-- Government Bringing Order in Madrid.	Aug 16, 1917sm_55428_1120-10250.xml	Many Killed as Result of Clashes in Barcelona and Sabadell. ARREST SOCIALIST LEADERS Accused of Spreadin

sm_55428_1120-13782.xml	Step Taken in Preparation for Possible Hostilities with Germany. LITTLE TRADING OF LATE No Sensational Aspect Attached to Elimination of the Reichsmark Market. Mrs. Emerson in Belgian Relief.	Mar 30, 1917sm_55428_1120-13782.xml	Step Taken in Preparation for Possible Hostilities with Germany. LITTLE TRADING OF LATE No Sensational Aspect Attached to Elimination of the Reichsmark Market. Mrs. Emerson in Belgian Relief.	Mar 30, 1917

sm_55428_1120-1380.xml	Letters in Code or Invisible Ink Carried by Scandinavian Seamen for Pay. INFORMATION FOR GERMANY One Letter in Every Five Intrusted to Messengers Suspicious--Many Arrests Due.	Dec 23, 1917sm_55428_1120-1380.xml	Letters in Code or Invisible Ink Carried by Scandinavian Seamen for Pay. INFORMATION FOR GERMANY One Letter in Every Five Intrusted to Messengers Suspicious--Many Arrests Due.	Dec 23, 1917

sm_55428_1120-14013.xml	Tariff on Tooth Brushes.	May 30, 1922sm_55428_1120-14013.xml	Tariff on Tooth Brushes.	May 30, 

sm_55428_1120-19839.xml	Washington Aroused by Publication of Arrival of Forces in France. GOVERNORS GOT MESSAGES Colonel and Lisutenant Colonel as Well as Censors Must Explain. OFFICIAL REQUEST IGNORED Dispatches Heralded in Many Newspapers Despite Department's Pica. Predicts Drastic Action. TO TRY OFFICERS GIVING TROOP NEWS Why News Is Kept Secret.	Oct 15, 1917sm_55428_1120-19839.xml	Washington Aroused by Publication of Arrival of Forces in France. GOVERNORS GOT MESSAGES Colonel and Lisutenant Colonel as Well as Censors Must Explain. OFFICIAL REQUEST IGNORED Dispatches Heralded in Many Newspapers Despite Department's Pica. Predicts Drastic Action. TO TRY OFFICERS GIVING TROOP NEWS Why News Is Kept Secret.	Oct 15, 1917

sm_55428_1120-19956.xml	Hohenzollerns and Hapsburgs Will Follow Romanoffs, He Declares. MASS MEETING FOR RUSSIA Cable Message from President Lvoff Cheered by 1,500 Celebrating Success of the Rovolution. Message from Roosevelt. Why Grand Duke Was Removed.	Mar 26, 1917sm_

sm_55428_1120-25475.xml	Officials Take Action to Insure Safety of Arsenals and Other Federal Property. NO ACCESS TO WHITE HOUSE Public Barred from State, War, and Navy Buildings;-Movements of Warships Secret.WATCH ON GERMAN AGENTS Department of Justice Has Many Under Surveillance of Its Investigation Division. Manufacturers Offer Their Plants. Messages Sent to Atlantic Fleet. Public Buildings Closed to Tourists. Keeping Watch on German Agents. Will Guard Docks and Railways. Cruiser Des Moines at Alexandria. BARS FRANKFORD ARSENAL. Commandant Keeps Out Visitors;-Cards for Employes. NO MORE FOREIGN CONTRACTS. General Electric Company Issues Order;-Westinghouse Plants Take Inventories.	Feb 4, 1917sm_55428_1120-25475.xml	Officials Take Action to Insure Safety of Arsenals and Other Federal Property. NO ACCESS TO WHITE HOUSE Public Barred from State, War, and Navy Buildings;-Movements of Warships Secret.WATCH ON GERMAN AGENTS Department of Justice Has Many Under Surveillance of Its Investiga

sm_55428_1120-29646.xml	Mrs. Alec Tweedie's New Year Message Is, Hurry Up! The Prussian Way. Colonel House in College. The Solid South. NORWAY AND AMERICA. A Reply to an Attack by a Norwegian Writer Called Forth by the Food Shipment Negotiations. THE RAILROADS. They Need a Unify	Jan 31, 1918sm_55428_1120-29646.xml	Mrs. Alec Tweedie's New Year Message Is, Hurry Up! The Prussian Way. Colonel House in College. The Solid South. NORWAY AND AMERICA. A Reply to an Attack by a Norwegian Writer Called Forth by the Food Shipment Negotiations. THE RAILROADS. They Need a Unify	Jan 31, 1918

sm_55428_1120-29809.xml	Effort to Cheer Up Sing Sing Convicts for Tonight's Executions.	Feb 2, 1922sm_55428_1120-29809.xml	Effort to Cheer Up Sing Sing Convicts for Tonight's Executions.	Feb 2, 1922

sm_55428_1120-29979.xml	Apr 27, 1917	19170427sm_55428_1120-29979.xml	Apr 27, 1917	19170427

sm_55428_1120-30334.xml	Jul 16, 1917	19170716sm_55428_1120-30334.xml	Jul 16, 1917	19170716

sm_55428_1120-30845.xml	Charle

sm_55428_1120-35024.xml	Aqueduct System from Croton South Watched by Militiamen. WHOLE GUARD NOT OUT 2,000 Special Officers Protect Subway System;-Armed Tugs in Harbor. Only Part of Guard Called. Close Watch on Bridges. ARMED GUARDS PATROL BRIDGES Orders to Boat Owners. Tug Patrol in East River. Police Censorship. Co-operate with Neutrality Squad. Aviation Field Closed to Visitors.	Feb 5, 1917sm_55428_1120-35024.xml	Aqueduct System from Croton South Watched by Militiamen. WHOLE GUARD NOT OUT 2,000 Special Officers Protect Subway System;-Armed Tugs in Harbor. Only Part of Guard Called. Close Watch on Bridges. ARMED GUARDS PATROL BRIDGES Orders to Boat Owners. Tug Patrol in East River. Police Censorship. Co-operate with Neutrality Squad. Aviation Field Closed to Visitors.	Feb 5, 1917

sm_55428_1120-35030.xml	Companies Agree to Co-operate with Government;-'Phone Lines to Mexico Included.	Apr 26, 1917sm_55428_1120-35030.xml	Companies Agree to Co-operate with Government;-'Phone Lines to Mex

sm_55428_1120-3721.xml	NATIONAL EXECUTIVE. National Politics HUMAN GAINS AND LOSSES FOR THE YEAR Woman Suffrage Prohibition: Liquid and Social In Mexico Vital Statistics Dead in 1920	Jan 2, 1921sm_55428_1120-3721.xml	NATIONAL EXECUTIVE. National Politics HUMAN GAINS AND LOSSES FOR THE YEAR Woman Suffrage Prohibition: Liquid and Social In Mexico Vital Statistics Dead in 1920	Jan 2, 1921

sm_55428_1120-3733.xml	Dec 3, 1922	19221203sm_55428_1120-3733.xml	Dec 3, 1922	19221203

sm_55428_1120-37565.xml	American Mandate Recommended in DocumentSent to Wilson.PEOPLE CALLED FOR USDisliked French, DistrustedBritish and Opposed theZionist Plan.ALLIES AT CROSS PURPOSES Our Control Would Have Hid Its Seat in Constantinople, Dominating New Nations. INTRODUCTION. Text of the Long-Hidden Crane-King Report on the Near East TEXT OF THE REPORT. Document Sent to President Recommended an American Mandate Report on Mesopotamia.	Dec 3, 1922sm_55428_1120-37565.xml	American Mandate Recommended in DocumentSent t

sm_55428_1120-41197.xml	Jul 20, 1922	19220720sm_55428_1120-41197.xml	Jul 20, 1922	19220720

sm_55428_1120-41461.xml	May 26, 1922	19220526sm_55428_1120-41461.xml	May 26, 1922	19220526

sm_55428_1120-41804.xml	New Cabinet Is Resolved to Double Efforts, Clemenceau Tells Deputies. SWIFT DOOM FOR PLOTTERS "Crimes" at Home Not to Hamper Army, Says Premier--Sustained by 418 to 65. 'NOTHING BUT WAR,' PLEDGE TO FRANCE No More "Treason Nor Semi-Treason." New Regime of Sacrifice.	Nov 21, 1917sm_55428_1120-41804.xml	New Cabinet Is Resolved to Double Efforts, Clemenceau Tells Deputies. SWIFT DOOM FOR PLOTTERS "Crimes" at Home Not to Hamper Army, Says Premier--Sustained by 418 to 65. 'NOTHING BUT WAR,' PLEDGE TO FRANCE No More "Treason Nor Semi-Treason." New Regime of Sacrifice.	Nov 21, 1917

sm_55428_1120-41895.xml	Police Refuse to Let Patrons Seized in Raid Telephone Friends. FOUND GUILTLESS IN COURT Detectives' Only Charge Is That They Thought There Was Improper Dancing. Lined Them Up Against Wal

sm_55428_1120-44002.xml	Shortage of Food, High Prices, and Much Misery Revealed. A REPLY TO DR. FLEXNER. Latin and Greek Taught as Well as Other Subjects.	Feb 5, 1917sm_55428_1120-44002.xml	Shortage of Food, High Prices, and Much Misery Revealed. A REPLY TO DR. FLEXNER. Latin and Greek Taught as Well as Other Subjects.	Feb 5, 1917

sm_55428_1120-44205.xml	Third Floor of Morris Building Now Used for That Service.	Aug 3, 1917sm_55428_1120-44205.xml	Third Floor of Morris Building Now Used for That Service.	Aug 3, 1917

sm_55428_1120-44761.xml	Ex-Justice Stirs Publishers, Asserting That "the Victory Is Ours." FAVORS A HUGE ARMY Would Send 5,000,000 MenAbroad to Overcome theGerman Foe.BAKER LAUDS OUR WORK Tells of Mammoth Preparations forMen at the Front--Daniels Pays Honest Criticism Defended. Hughes Discusses Sedition. Aimed Gun at Heart of France. Says Treason Must Be Punished. Sees Field for Honest Criticism. No Place for Partisanship. Says Country Should Have Facts. Trime of Rare Privi

sm_55428_1120-506.xml	Comes Unexpectedly to See for Himself Conditions at Bronx Institution. PROMISES TO PUNISH LAXITY Tells of Administrative Troubles-- Investigators Hear Legion Witness. Promises to Punish the Guilty. To Ask for $5,000,000. Deegan Asks Improvements.	Sep 15, 1922sm_55428_1120-506.xml	Comes Unexpectedly to See for Himself Conditions at Bronx Institution. PROMISES TO PUNISH LAXITY Tells of Administrative Troubles-- Investigators Hear Legion Witness. Promises to Punish the Guilty. To Ask for $5,000,000. Deegan Asks Improvements.	Sep 15, 1922

sm_55428_1120-5075.xml	Mar 13, 1921	19210313sm_55428_1120-5075.xml	Mar 13, 1921	19210313

sm_55428_1120-5141.xml	Apr 21, 1922	19220421sm_55428_1120-5141.xml	Apr 21, 1922	19220421

sm_55428_1120-5402.xml	Central News Denies Filing It, but Senator Replies He Has It.	Jul 27, 1917sm_55428_1120-5402.xml	Central News Denies Filing It, but Senator Replies He Has It.	Jul 27, 1917

sm_55428_1120-5699.xml	Brooklyn Canon Asks Tammany Senator H

sm_55428_1120-7622.xml	Was Told Not to Let British Pull Wool Over His Eyes, He Says.READS LETTER TO DANIELSDeclines to Name the "Personin Authority" Who Gave theInstruction Quoted.ASSERTS HE WAS HAMPERED Members of Congress DeploreDevelopment in the Disputeas Unfortunate. Congressmen Deplore Statement. SIMS STARTLES SENATORS Wanted Ships Massed Abroad. Royal Road to Victory" Sought. Daniels Invited Suggestions. Text of Sim's Memorandum. SIMS STARTLES SENATORS Tells of "Admonition" on British. Charges Delay in Full Co-operation. Intricate Details Called For." Refers to Dispatches and Letters. Says Convoy Plan Was Delayed. Advised on U-boat Danger Here. Emphasizes His Lack of Help. SIMS STARTLES SENATORS Says He Had Impossible Task. Sims Tells of Letter to Bagley.	Jan 18, 1920sm_55428_1120-7622.xml	Was Told Not to Let British Pull Wool Over His Eyes, He Says.READS LETTER TO DANIELSDeclines to Name the "Personin Authority" Who Gave theInstruction Quoted.ASSERTS HE WAS HAMPERED Members of 

sm_55428_1021-2567.xml	THE CASE OF PORTILLO WAR NEWS PERSECUTION OF THE PRESS GENERAL GOSSIP. THE WAR. A SUSPICIOUS CRAFT. PERSECUTION OF THE PRESS. THE REPUBLICAN PARTY. MISCELLANEOUS GOSSIP.	Jul 18, 1873sm_55428_1021-2567.xml	THE CASE OF PORTILLO WAR NEWS PERSECUTION OF THE PRESS GENERAL GOSSIP. THE WAR. A SUSPICIOUS CRAFT. PERSECUTION OF THE PRESS. THE REPUBLICAN PARTY. MISCELLANEOUS GOSSIP.	Jul 18, 1873

sm_55428_1021-25915.xml	Aug 4, 1877	18770804sm_55428_1021-25915.xml	Aug 4, 1877	18770804

sm_55428_1021-26582.xml	DETAILS OF THE CAPTURE OF LAS TUNAS BY THE INSURGENTS THE GOVERNOR ASSASSINATED BY HIS OWN SOLDIERS DISCONTENT AND INSUBORDINATION AT PUERTO FRINCIPE.	Oct 7, 1876sm_55428_1021-26582.xml	DETAILS OF THE CAPTURE OF LAS TUNAS BY THE INSURGENTS THE GOVERNOR ASSASSINATED BY HIS OWN SOLDIERS DISCONTENT AND INSUBORDINATION AT PUERTO FRINCIPE.	Oct 7, 1876

sm_55428_1021-26937.xml	WHO THEY ARE, AND HOW THEY LIVE THEIR RELATIONS TO THE PRESENT STRUGGLE--THE MANNERS OF A PEOPLE LIT

sm_55428_1021-38074.xml	Dec 26, 1879	18791226sm_55428_1021-38074.xml	Dec 26, 1879	18791226

sm_55428_1021-40610.xml	THE ORATION, POEM, AND OTHER EXERCISES--COLLEGE FINANCES AND FACULTY CHANGES--THE OUTGOING AND INCOMING CLASSES--INTERESTING STATISTICS.	Jun 27, 1877sm_55428_1021-40610.xml	THE ORATION, POEM, AND OTHER EXERCISES--COLLEGE FINANCES AND FACULTY CHANGES--THE OUTGOING AND INCOMING CLASSES--INTERESTING STATISTICS.	Jun 27, 1877

sm_55428_1021-41642.xml	HINTS ON DRESS-MAKING. THE QUESTION OF TRIMMINGS THE STYLE IN SPRING SUITS NOVEL BUTTONS NEWEST BONNETS MISCELLANEOUS TOILETS.	Mar 18, 1877sm_55428_1021-41642.xml	HINTS ON DRESS-MAKING. THE QUESTION OF TRIMMINGS THE STYLE IN SPRING SUITS NOVEL BUTTONS NEWEST BONNETS MISCELLANEOUS TOILETS.	Mar 18, 1877

sm_55428_1021-42451.xml	FINAL ADJOUNMENT OF THE LEGISLATURE. THE CLOSING SCENES SPEAKER HUSTED LEAVES THE CHAIR MORE POPULAR THAN EVER THE POLICE BILL.	May 1, 1874sm_55428_1021-42451.xml	FINAL ADJOUNMENT OF THE LEGISLATURE. THE CLOS

sm_55428_1022-17237.xml	FINANCIAL, RELIGIOUS, AND ARTISTIC TOPICS.THE POLICY OF THE BANK OF ENGLAND THE CASE OF DR. COLENSO THEATRICAL INDECENCIES.	Dec 18, 1874sm_55428_1022-17237.xml	FINANCIAL, RELIGIOUS, AND ARTISTIC TOPICS.THE POLICY OF THE BANK OF ENGLAND THE CASE OF DR. COLENSO THEATRICAL INDECENCIES.	Dec 18, 1874

sm_55428_1022-17617.xml	May 21, 1873	18730521sm_55428_1022-17617.xml	May 21, 1873	18730521

sm_55428_1022-17839.xml	RECENT HOLIDAYS THE AMINISTRATION OF JUSTICE.PENTECOSTAL INCIDENTS THE RACES CORONATION OF THE NANTERRE ROSIERE A PERSECUTED PUBLISHER WHERE IMMORALITY BEGINS THE STORY OF TWO POLICEMEN. DEATH FROM THE BITE OF A CAT.	Jun 1, 1875sm_55428_1022-17839.xml	RECENT HOLIDAYS THE AMINISTRATION OF JUSTICE.PENTECOSTAL INCIDENTS THE RACES CORONATION OF THE NANTERRE ROSIERE A PERSECUTED PUBLISHER WHERE IMMORALITY BEGINS THE STORY OF TWO POLICEMEN. DEATH FROM THE BITE OF A CAT.	Jun 1, 1875

sm_55428_1022-18164.xml	Jul 14, 1874	18740714sm_55428_1022-18164.xml	Jul 14, 187

sm_55428_1022-47768.xml	A HEAD SEVERED FROM A BODY. A GREENPOINT WAGON-MAKER MURDERED FINDING OF HIS HEAD WRAPPED IN AN OLD NEWSPAPER THE BROOKLYN POLICE WITHOUT A CLUE THEORIES REGARDING THE TRAGEDY. THE BOSTON FORGER'S ESCAPE. BUSINESS EMBARRASSMENTS. PROCEEEINGS IN BANKRUPTCY. ASSIGNMENTS AND JUDGEMENTS. THE FAILURE OF A CINCINNATI FIRM.	Jan 30, 1876sm_55428_1022-47768.xml	A HEAD SEVERED FROM A BODY. A GREENPOINT WAGON-MAKER MURDERED FINDING OF HIS HEAD WRAPPED IN AN OLD NEWSPAPER THE BROOKLYN POLICE WITHOUT A CLUE THEORIES REGARDING THE TRAGEDY. THE BOSTON FORGER'S ESCAPE. BUSINESS EMBARRASSMENTS. PROCEEEINGS IN BANKRUPTCY. ASSIGNMENTS AND JUDGEMENTS. THE FAILURE OF A CINCINNATI FIRM.	Jan 30, 1876

sm_55428_1022-4908.xml	May 21, 1878	18780521sm_55428_1022-4908.xml	May 21, 1878	18780521

sm_55428_1022-49540.xml	Sep 2, 1876	18760902sm_55428_1022-49540.xml	Sep 2, 1876	18760902

sm_55428_1022-5884.xml	Dec 3, 1876	18761203sm_55428_1022-5884.xml	Dec 3, 1876	18761203

sm_55428_1022-5973.x

sm_55428_1023-12643.xml	CLOSING THE THEATRES. FOREIGN MUSICAL NOTES.	Jun 30, 1879sm_55428_1023-12643.xml	CLOSING THE THEATRES. FOREIGN MUSICAL NOTES.	Jun 30, 1879

sm_55428_1023-13315.xml	THE ORATION, POEM, AND OTHER EXERCISES--COLLEGE FINANCES AND FACULTY CHANGES--THE OUTGOING AND INCOMING CLASSES--INTERESTING STATISTICS.	Jun 27, 1877
sm_55428_1023-13315.xml	THE ORATION, POEM, AND OTHER EXERCISES--COLLEGE FINANCES AND FACULTY CHANGES--THE OUTGOING AND INCOMING CLASSES--INTERESTING STATISTICS.	Jun 27, 1877sm_55428_1023-15221.xml	May 7, 1876	18760507

sm_55428_1023-15221.xml	May 7, 1876	18760507sm_55428_1023-15575.xml	A FLOOD OF PRIVATE BILLS POURED INTO ASSEMBLY AND SENATE LAST NIGHT.	Feb 16, 1875

sm_55428_1023-15575.xml	A FLOOD OF PRIVATE BILLS POURED INTO ASSEMBLY AND SENATE LAST NIGHT.	Feb 16, 1875sm_55428_1023-1561.xml	NEW-YORK AND OSWEGO MIDLAND. A PLAN SUBMITTED FOR THE REORGANIZATION OF THE ROAD--DETAILS OF THE SCHEME--OBLIGATIONS OF BONDHOLDERS--THE LIMIT OF TIME.	Nov 17, 1878

sm_55428_1023-3927.xml	ARTISTS IN THE ANCIENT CITY. AMBITIOUS STUDENTS BREATHING AN ATMOSPHERE OF ART A ROMAN ACADEMY WITHOUT A SCHOOL APPLIANCE AND MEANS FOR INSTRUCTION THE ANNUAL EXPOSITIONS THE SHODDY ELEMENT IN SOCIETY.	Aug 3, 1876sm_55428_1023-3927.xml	ARTISTS IN THE ANCIENT CITY. AMBITIOUS STUDENTS BREATHING AN ATMOSPHERE OF ART A ROMAN ACADEMY WITHOUT A SCHOOL APPLIANCE AND MEANS FOR INSTRUCTION THE ANNUAL EXPOSITIONS THE SHODDY ELEMENT IN SOCIETY.	Aug 3, 1876

sm_55428_1023-40981.xml	May 7, 1876	18760507sm_55428_1023-40981.xml	May 7, 1876	18760507

sm_55428_1023-41749.xml	Dec 16, 1878	18781216sm_55428_1023-41749.xml	Dec 16, 1878	18781216

sm_55428_1023-43399.xml	HOW A MEMBER-ELECT WAS GREETED WITH LOUD LAUGHTER SEARCHING FOR HIS CREDENTIALS.	Mar 20, 1877sm_55428_1023-43399.xml	HOW A MEMBER-ELECT WAS GREETED WITH LOUD LAUGHTER SEARCHING FOR HIS CREDENTIALS.	Mar 20, 1877

sm_55428_1023-44495.xml	Jul 22, 1878	18780722sm_55428_1023-44495.xml	Jul 22, 1878	18780722

sm_55428_1023-44

sm_55428_1023-688.xml	ITALY. OPENING OF PARLIAMENT THE KING'S SPEECH. THE GALE IN THE MEDITERRANEAN. FRANCE. THE REPUBLICANS CARRY THE MUNICIPAL ELECTIONS GENERALLY. DANGEROUS ILLNESS OF BLANQUI THE IMPRISONED COMMUNIST. GREAT BRITAIN. CONTINUANCE OF THE FOGS TRAVEL ON LAND AND WATER DANGEROUS. BAPTISM OF THE DUKE OF EDINBURGH'S SON. COMMEMORATING THE EXECUTION OF FENIANS. THE NEW POLRR EXPEDITION LADY FRANKLIN'S OFFER. SPAIN. SERRANO GOING NORTH TO EXPEDITE MILITARY OPERATIONS WATCHING THE ADHERENTS OF EX-QUEEN ISABELLA. GERMANY. THE REPLY TO THE SPANISH NOTE THE SPENER GAZETTE AND VON ARNIM. BRAZIL. DEMONSTRATION IN PARA AGAINST PORTUGUESE, AND THE PORTUGUESE CORVETTE SENT TO PROTECT THEM. THE ARGENTINE CONFEDERATION. THE END OF THE REBELLION REPORTED. AFGHANISTAN. SERIOUS COMPLICATIONS FEARED. CUBA. AN AMERICAN BRIG IN A HURRICANE THE MATE DROWNED.	Nov 24, 1874sm_55428_1023-688.xml	ITALY. OPENING OF PARLIAMENT THE KING'S SPEECH. THE GALE IN THE MEDITERRANEAN. FRANCE. THE REPUBLICANS

sm_55428_1024-3446.xml	GOOD AND BAD POINTS OF NEWSPAPERS FROM A CLERGYMAN'S STAND-POINT--SERMON BY REV. R. HEBER NEWTON.	Feb 17, 1879sm_55428_1024-3446.xml	GOOD AND BAD POINTS OF NEWSPAPERS FROM A CLERGYMAN'S STAND-POINT--SERMON BY REV. R. HEBER NEWTON.	Feb 17, 1879

sm_55428_1024-36029.xml	Jan 14, 1877	18770114sm_55428_1024-36029.xml	Jan 14, 1877	18770114

sm_55428_1024-36214.xml	Jun 24, 1876	18760624sm_55428_1024-36214.xml	Jun 24, 1876	18760624

sm_55428_1024-36970.xml	HE DENOUNCES PLAY-HOUSES AND EXHORTS PLAYERS TO GIVE UP THEIR PROFESSION AND PREACH THE GOSPEL.	Jan 15, 1877sm_55428_1024-36970.xml	HE DENOUNCES PLAY-HOUSES AND EXHORTS PLAYERS TO GIVE UP THEIR PROFESSION AND PREACH THE GOSPEL.	Jan 15, 1877

sm_55428_1024-37371.xml	THE TERRIFIC BATTLE OF PISAGUA. STORMING THE HEIGHTS--FIERCE VALOR OF THE CHILIANS--DISGRACEFUL ATTACK UPON WOMEN IN LIMA--SOME RESULTS OF THE WAR.	Jan 3, 1880sm_55428_1024-37371.xml	THE TERRIFIC BATTLE OF PISAGUA. STORMING THE HEIGHTS--FIERCE VALOR OF THE C

sm_55428_1025-14747.xml	THE TRIAL OF REV. C.P. M'CARTHY. EXTRAORDINARY PROCEEDINGS IN THE THIRD UNIVERSALIST CHURCH M. M'CARTHY ANNOUNCES THAT HE HAS NO RESPECT FOR CERTAIN MEMBERS OF THE COMMITTEE HE DENOUNCES THEIR ACTION AND THAT OF HIS PROSECUTOR SEVERELY.	May 24, 1877sm_55428_1025-14747.xml	THE TRIAL OF REV. C.P. M'CARTHY. EXTRAORDINARY PROCEEDINGS IN THE THIRD UNIVERSALIST CHURCH M. M'CARTHY ANNOUNCES THAT HE HAS NO RESPECT FOR CERTAIN MEMBERS OF THE COMMITTEE HE DENOUNCES THEIR ACTION AND THAT OF HIS PROSECUTOR SEVERELY.	May 24, 1877

sm_55428_1025-14920.xml	May 13, 1877	18770513sm_55428_1025-14920.xml	May 13, 1877	18770513

sm_55428_1025-15487.xml	Oct 1, 1879	18791001sm_55428_1025-15487.xml	Oct 1, 1879	18791001

sm_55428_1025-15540.xml	Jun 5, 1878	18780605sm_55428_1025-15540.xml	Jun 5, 1878	18780605

sm_55428_1025-17018.xml	Feb 17, 1879	18790217sm_55428_1025-17018.xml	Feb 17, 1879	18790217

sm_55428_1025-17477.xml	THE RECORD OF A HERO OF THE MEXICAN WAR THREE YEARS' SUBSEQUENT 

sm_55428_1025-40657.xml	MRS. DORSEY'S BEQUEST TO JEFFERSON DAVIS.HOW THE REBEL EX-PRESIDENT HAS LATELY LIVED AND WORKED HIS HABITS AND SURROUNDINGS.	Jul 22, 1879sm_55428_1025-40657.xml	MRS. DORSEY'S BEQUEST TO JEFFERSON DAVIS.HOW THE REBEL EX-PRESIDENT HAS LATELY LIVED AND WORKED HIS HABITS AND SURROUNDINGS.	Jul 22, 1879

sm_55428_1025-40764.xml	AN OUTLINE OF THE CHIEF WORK OF THE SEASON.	Jun 13, 1880sm_55428_1025-40764.xml	AN OUTLINE OF THE CHIEF WORK OF THE SEASON.	Jun 13, 1880

sm_55428_1025-41223.xml	THE APPORTIONMENT BILLS. A VIGOROUS AND EARNEST DEBATE IN THE ASSEMBLY--THE TACTICS OF THE DEMOCRATIC MEMBERS--THE MAJORITY REPORTS AGREED TO AND THE BILLS ORDERED TO A THIRD READING.	May 2, 1876sm_55428_1025-41223.xml	THE APPORTIONMENT BILLS. A VIGOROUS AND EARNEST DEBATE IN THE ASSEMBLY--THE TACTICS OF THE DEMOCRATIC MEMBERS--THE MAJORITY REPORTS AGREED TO AND THE BILLS ORDERED TO A THIRD READING.	May 2, 1876

sm_55428_1025-41388.xml	Sep 19, 1880	18800919sm_55428_1025-41388.xml	Sep 1

sm_55428_1026-26806.xml	Oct 25, 1879	18791025sm_55428_1026-26806.xml	Oct 25, 1879	18791025

sm_55428_1026-26941.xml	May 1, 1881	18810501sm_55428_1026-26941.xml	May 1, 1881	18810501

sm_55428_1026-27769.xml	THE BALLET AND MEMORIES THAT IT SUGGESTS. REVIVAL OF DANCING IN THE OPERA THE SITE OF TEMPLE BAR AND THE CHANGES IT HAS SEEN KEW GARDENS AND THEIR FAMOUS DIRECTOR.	Nov 28, 1879sm_55428_1026-27769.xml	THE BALLET AND MEMORIES THAT IT SUGGESTS. REVIVAL OF DANCING IN THE OPERA THE SITE OF TEMPLE BAR AND THE CHANGES IT HAS SEEN KEW GARDENS AND THEIR FAMOUS DIRECTOR.	Nov 28, 1879

sm_55428_1026-28035.xml	Feb 5, 1878	18780205sm_55428_1026-28035.xml	Feb 5, 1878	18780205

sm_55428_1026-28098.xml	SUPERSTITIONS. THE CAT. LATE MAGAZINES. NEW BOOKS.	May 1, 1881sm_55428_1026-28098.xml	SUPERSTITIONS. THE CAT. LATE MAGAZINES. NEW BOOKS.	May 1, 1881

sm_55428_1026-29853.xml	EXCITMENT CAUSED BY BISHOP PINKNEY'S FIRST OFFICAL ACT.	Dec 20, 1879sm_55428_1026-29853.xml	EXCITMENT CAUSED BY BISHOP PINKNEY'S

sm_55428_1026-7680.xml	CAPE MAY NEARLY DESTROYED BY FIRE. SIX HOTELS, TWENTY COTTAGES, AND MANY SMALLER BUILDINGS BURNED--THE LOSS ESTIMATED AT $400,000--SUPPOSED INCENDIARY ORIGIN OF THE FIRE--LIST OF LOSSES AND INSURANCES. THE BURNED WATERING-PLACE.	Nov 10, 1878sm_55428_1026-7680.xml	CAPE MAY NEARLY DESTROYED BY FIRE. SIX HOTELS, TWENTY COTTAGES, AND MANY SMALLER BUILDINGS BURNED--THE LOSS ESTIMATED AT $400,000--SUPPOSED INCENDIARY ORIGIN OF THE FIRE--LIST OF LOSSES AND INSURANCES. THE BURNED WATERING-PLACE.	Nov 10, 1878

sm_55428_1026-8015.xml	Dec 26, 1880	18801226sm_55428_1026-8015.xml	Dec 26, 1880	18801226

sm_55428_1026-9438.xml	AN AUTHORITATIVE ACCOUNT OF EDWIN M. STANTON'S DEATH.	Apr 20, 1879sm_55428_1026-9438.xml	AN AUTHORITATIVE ACCOUNT OF EDWIN M. STANTON'S DEATH.	Apr 20, 1879

sm_55428_1027-10620.xml	THE CASE OF D.M. BENNETT CHRISTIAN CREEDS DENOUNCED AS TYRANNICAL.	Jun 4, 1879sm_55428_1027-10620.xml	THE CASE OF D.M. BENNETT CHRISTIAN CREEDS DENOUNCED AS TYRANNICAL.	Jun 4, 

sm_55428_1027-47867.xml	CLOSING THE THEATRES. FOREIGN MUSICAL NOTES.	Jun 30, 1879sm_55428_1027-47867.xml	CLOSING THE THEATRES. FOREIGN MUSICAL NOTES.	Jun 30, 1879

sm_55428_1027-48865.xml	Dec 9, 1877	18771209sm_55428_1027-48865.xml	Dec 9, 1877	18771209

sm_55428_1027-49805.xml	HOLDING A VISITATION IN THE CENTRAL PRESBYTERIAN CHURCH.	Mar 12, 1880sm_55428_1027-49805.xml	HOLDING A VISITATION IN THE CENTRAL PRESBYTERIAN CHURCH.	Mar 12, 1880

sm_55428_1027-5918.xml	MILITARY RULE IN RUSSIA. MARTIAL LAW BUT NOT A REIGN OF TERROR HOW EXAGGERATED REPORTS GET ABROAD.	May 30, 1879sm_55428_1027-5918.xml	MILITARY RULE IN RUSSIA. MARTIAL LAW BUT NOT A REIGN OF TERROR HOW EXAGGERATED REPORTS GET ABROAD.	May 30, 1879

sm_55428_1027-6215.xml	HORACE H. DAY.	Aug 27, 1878sm_55428_1027-6215.xml	HORACE H. DAY.	Aug 27, 1878

sm_55428_1027-6346.xml	MUSICAL AND DRAMATIC. THE ORATORIO SOCIETY. CHAMBER MUSIC. UNION-SQUARE THEATRE. MUSICAL NOTES.	Apr 18, 1879sm_55428_1027-6346.xml	MUSICAL AND DRAMATIC. THE ORATOR

sm_55428_1028-33246.xml	WHAT SORT OF A STATE NEW-MEXICO WOULD MAKE.	Feb 6, 1882sm_55428_1028-33246.xml	WHAT SORT OF A STATE NEW-MEXICO WOULD MAKE.	Feb 6, 1882

sm_55428_1028-33966.xml	FIRST ANNUAL MEETING OF THE AMERICAN SURGICAL SOCIETY.	Sep 14, 1881sm_55428_1028-33966.xml	FIRST ANNUAL MEETING OF THE AMERICAN SURGICAL SOCIETY.	Sep 14, 1881

sm_55428_1028-34080.xml	Jan 15, 1881	18810115sm_55428_1028-34080.xml	Jan 15, 1881	18810115

sm_55428_1028-34178.xml	Dec 11, 1881	18811211sm_55428_1028-34178.xml	Dec 11, 1881	18811211

sm_55428_1028-36262.xml	Dec 21, 1884	18841221sm_55428_1028-36262.xml	Dec 21, 1884	18841221

sm_55428_1028-37475.xml	SOME ACCOUNT OF THE GREAT LIBRARY. HOW THE BOOKS WERE COLLECTED AND HOW MANY OF THEM ARE ALMOST PRICELESS--NOT A "POPULAR" LIBRARY, BUT A VERY SOLID ONE.	Oct 30, 1881sm_55428_1028-37475.xml	SOME ACCOUNT OF THE GREAT LIBRARY. HOW THE BOOKS WERE COLLECTED AND HOW MANY OF THEM ARE ALMOST PRICELESS--NOT A "POPULAR" LIBRARY, BUT A VERY SOLID ONE.	Oct 30, 1881

sm_55428_1029-24561.xml	AN ASSAULT GROWING OUT OF A SCANDAL ON THE EASTERN SHORE OF MARYLAND.	Apr 7, 1881sm_55428_1029-24561.xml	AN ASSAULT GROWING OUT OF A SCANDAL ON THE EASTERN SHORE OF MARYLAND.	Apr 7, 1881

sm_55428_1029-25407.xml	HE TELLS A CHICAGO AUDIENCE WHAT HE THINKS OF ENGLAND.	May 6, 1883sm_55428_1029-25407.xml	HE TELLS A CHICAGO AUDIENCE WHAT HE THINKS OF ENGLAND.	May 6, 1883

sm_55428_1029-25833.xml	Aug 3, 1882	18820803sm_55428_1029-25833.xml	Aug 3, 1882	18820803

sm_55428_1029-26026.xml	CLOUDS OF DUST, THE SMALL-POX, AND TABLE MOUNTAIN.	Dec 31, 1882sm_55428_1029-26026.xml	CLOUDS OF DUST, THE SMALL-POX, AND TABLE MOUNTAIN.	Dec 31, 1882

sm_55428_1029-26235.xml	Sep 8, 1882	18820908sm_55428_1029-26235.xml	Sep 8, 1882	18820908

sm_55428_1029-27284.xml	Dec 21, 1884	18841221sm_55428_1029-27284.xml	Dec 21, 1884	18841221

sm_55428_1029-2758.xml	PROVISIONS OF THE WILL OF THE DEAD PHILANTHROPIST.	Aug 25, 1881sm_55428_1029-2758.xml	PROVISIONS OF THE WILL OF THE DEAD PHILANTHROPIST

sm_55428_1030-14021.xml	Nov 15, 1881	18811115sm_55428_1030-14021.xml	Nov 15, 1881	18811115

sm_55428_1030-17363.xml	Aug 17, 1883	18830817sm_55428_1030-17363.xml	Aug 17, 1883	18830817

sm_55428_1030-17706.xml	Jan 13, 1879	18790113sm_55428_1030-17706.xml	Jan 13, 1879	18790113

sm_55428_1030-17985.xml	INCIDENTS OF THE STAGE AND OF REAL LIFE. LENTEN FARE AT THE THEATRES OLD AND NEW PIECES "FATINITZA," A BRILLIANT ATTRACTION FROM VIENNA SCANDAL, CRIME, MYSTERY, AND MADNESS.	Apr 7, 1879sm_55428_1030-17985.xml	INCIDENTS OF THE STAGE AND OF REAL LIFE. LENTEN FARE AT THE THEATRES OLD AND NEW PIECES "FATINITZA," A BRILLIANT ATTRACTION FROM VIENNA SCANDAL, CRIME, MYSTERY, AND MADNESS.	Apr 7, 1879

sm_55428_1030-18350.xml	Sep 23, 1882	18820923sm_55428_1030-18350.xml	Sep 23, 1882	18820923

sm_55428_1030-18864.xml	Apr 16, 1882	18820416sm_55428_1030-18864.xml	Apr 16, 1882	18820416

sm_55428_1030-19220.xml	Jun 4, 1883	18830604sm_55428_1030-19220.xml	Jun 4, 1883	18830604

sm_55428_1030-20203.xml	CONFES

sm_55428_1030-48372.xml	GEN. WOODFORD DECLARES HE NEVER SLANDERED OR CRITICISED THE PRESIDENT.	Dec 31, 1882sm_55428_1030-49779.xml	FATHER O'FARRELL'S GIFT TO THE CATHOLIC CHAPEL OF ST. THERESA.	Jun 26, 1882

sm_55428_1030-49779.xml	FATHER O'FARRELL'S GIFT TO THE CATHOLIC CHAPEL OF ST. THERESA.	Jun 26, 1882sm_55428_1030-519.xml	THE RAILWAY WAR STILL FIERCELY WAGED. THE ST. PAUL POOL BROKEN. THE NORTH-WESTERN POOL. A WAR BETWEEN THE RIVAL RAILROAD COMPANIES--THE CAUSES OF IT. DIFFERENTIAL RATES. THE QUESTION DISCUSSED BY THE RAILROAD MEN YESTERDAY. ERIE'S ROUTE TO THE WEST. GENERAL RAILWAY NEWS.	Nov 26, 1882

sm_55428_1030-519.xml	THE RAILWAY WAR STILL FIERCELY WAGED. THE ST. PAUL POOL BROKEN. THE NORTH-WESTERN POOL. A WAR BETWEEN THE RIVAL RAILROAD COMPANIES--THE CAUSES OF IT. DIFFERENTIAL RATES. THE QUESTION DISCUSSED BY THE RAILROAD MEN YESTERDAY. ERIE'S ROUTE TO THE WEST. GENERAL RAILWAY NEWS.	Nov 26, 1882sm_55428_1030-5604.xml	THE FRENCH HOSTILITIES AGAINST TONQUIN.	Aug 4, 1883

sm_

sm_55428_1031-36521.xml	FAST FREIGHT LINES. WATERED STOCK.	Jan 23, 1880sm_55428_1031-36521.xml	FAST FREIGHT LINES. WATERED STOCK.	Jan 23, 1880
sm_55428_1031-37041.xml	A STRONG OPINION FROM CHIEFJUSTICE DAVIS. THE TEXT OF THE OPINION. FURTHER EVIDENCES OF FORGERY. MR. HEWITT AND THE LETTER.	Nov 14, 1880

sm_55428_1031-37041.xml	A STRONG OPINION FROM CHIEFJUSTICE DAVIS. THE TEXT OF THE OPINION. FURTHER EVIDENCES OF FORGERY. MR. HEWITT AND THE LETTER.	Nov 14, 1880sm_55428_1031-37057.xml	DESTRUCTION IN NEW-ENGLAND AND THIS STATE.	Jan 28, 1882

sm_55428_1031-37057.xml	DESTRUCTION IN NEW-ENGLAND AND THIS STATE.	Jan 28, 1882sm_55428_1031-37772.xml	HEALTH AND DISEASE. MARION HARLAND'S LOITERINGS. L'ART. THE NORTH AMERICAN REVIEW. THE SAVAGE.	May 23, 1880

sm_55428_1031-37772.xml	HEALTH AND DISEASE. MARION HARLAND'S LOITERINGS. L'ART. THE NORTH AMERICAN REVIEW. THE SAVAGE.	May 23, 1880sm_55428_1031-37859.xml	Nov 13, 1881	18811113

sm_55428_1031-37859.xml	Nov 13, 1881	18811113sm_55428_1031-38112

sm_55428_1032-28746.xml	MME. DE REMUSAT'S LETTERS.	Aug 21, 1881sm_55428_1032-29950.xml	WHY FORTY AMERICAN STUDENTS QUIT ST. LAURENT COLLEGE.	Oct 25, 1885

sm_55428_1032-29950.xml	WHY FORTY AMERICAN STUDENTS QUIT ST. LAURENT COLLEGE.	Oct 25, 1885sm_55428_1032-33110.xml	A FAIRY SPOT IN THE BROAD ST. LAWRENCE.	Aug 11, 1881

sm_55428_1032-33110.xml	A FAIRY SPOT IN THE BROAD ST. LAWRENCE.	Aug 11, 1881sm_55428_1032-33615.xml	THE SETTLEMENT REGARDED AS A MAKE-SHIFT.	Sep 15, 1881

sm_55428_1032-33615.xml	THE SETTLEMENT REGARDED AS A MAKE-SHIFT.	Sep 15, 1881
sm_55428_1032-34380.xml	Dec 6, 1885	18851206sm_55428_1032-34380.xml	Dec 6, 1885	18851206

sm_55428_1032-34823.xml	Feb 11, 1884	18840211sm_55428_1032-34823.xml	Feb 11, 1884	18840211

sm_55428_1032-35135.xml	Feb 3, 1884	18840203sm_55428_1032-35135.xml	Feb 3, 1884	18840203

sm_55428_1032-3583.xml	PROGRESS OF THE MILITARY AND DIPLOMATIC OPERATIONS.	Jul 31, 1882sm_55428_1032-3583.xml	PROGRESS OF THE MILITARY AND DIPLOMATIC OPERATIONS.	Jul 31, 18

sm_55428_1033-19274.xml	GRAVE FEARS FOR THE SAFETY OF THE BRITISH COLUMN. THE BATTLE OF ABU-KLEA NOT A DECISIVE BRITISH VICTORY--GEN. STEWART INTRENCHED--NEWS SUPPRESSED.	Jan 24, 1885sm_55428_1033-19274.xml	GRAVE FEARS FOR THE SAFETY OF THE BRITISH COLUMN. THE BATTLE OF ABU-KLEA NOT A DECISIVE BRITISH VICTORY--GEN. STEWART INTRENCHED--NEWS SUPPRESSED.	Jan 24, 1885

sm_55428_1033-19498.xml	THE USEFULNESS OF THESE PERSONS IN THE SOCIAL WORLD.	Aug 6, 1883sm_55428_1033-19498.xml	THE USEFULNESS OF THESE PERSONS IN THE SOCIAL WORLD.	Aug 6, 1883

sm_55428_1033-21499.xml	Aug 19, 1883	18830819sm_55428_1033-21499.xml	Aug 19, 1883	18830819

sm_55428_1033-22207.xml	THE HON. CHAUNCEY M. DEPEW BEFORE THE STATE EDITORS.	Jun 20, 1883sm_55428_1033-22207.xml	THE HON. CHAUNCEY M. DEPEW BEFORE THE STATE EDITORS.	Jun 20, 1883

sm_55428_1033-23024.xml	Dec 23, 1883	18831223sm_55428_1033-23024.xml	Dec 23, 1883	18831223

sm_55428_1033-23976.xml	Jan 8, 1883	18830108sm_55428_1033-23976.xml	Jan 8, 1883	18830108



sm_55428_1034-32371.xml	Apr 26, 1880	18800426sm_55428_1034-33586.xml	TWO CANDIDATES QUARRELING OVER A VACANT CAPTAINCY.	Apr 3, 1883

sm_55428_1034-33586.xml	TWO CANDIDATES QUARRELING OVER A VACANT CAPTAINCY.	Apr 3, 1883sm_55428_1034-35252.xml	ASSEMBLYMEN WHOSE VIRTUE CRIED FOR VINDICATION. AN OUTPOURING OF BILE AND A VOTE OF CENSURE--A STOLEN BILL--THREE HEADS FOR THE PARK DEPARTMENT.	May 16, 1884

sm_55428_1034-35252.xml	ASSEMBLYMEN WHOSE VIRTUE CRIED FOR VINDICATION. AN OUTPOURING OF BILE AND A VOTE OF CENSURE--A STOLEN BILL--THREE HEADS FOR THE PARK DEPARTMENT.	May 16, 1884sm_55428_1034-35324.xml	Mar 17, 1884	18840317

sm_55428_1034-35324.xml	Mar 17, 1884	18840317sm_55428_1034-36273.xml	Dec 7, 1881	18811207

sm_55428_1034-36273.xml	Dec 7, 1881	18811207sm_55428_1034-36625.xml	THE DAY AT GLEN ISLAND, ROCKAWAY, AND CONEY ISLAND.	Jul 30, 1883

sm_55428_1034-36625.xml	THE DAY AT GLEN ISLAND, ROCKAWAY, AND CONEY ISLAND.	Jul 30, 1883sm_55428_1034-3708.xml	Feb 28, 1880	18800228

sm_55428_10

sm_55428_1035-23966.xml	CONFLICT BETWEEN INSURGENTS AND LOYAL TROOPS IN SPAIN. SEEKING SAFETY IN FLIGHT TO OTHER COUNTRIES--THE MADRID GARRISON REVIEWED BY THE KING.	Aug 14, 1883sm_55428_1035-23966.xml	CONFLICT BETWEEN INSURGENTS AND LOYAL TROOPS IN SPAIN. SEEKING SAFETY IN FLIGHT TO OTHER COUNTRIES--THE MADRID GARRISON REVIEWED BY THE KING.	Aug 14, 1883

sm_55428_1035-24024.xml	Jun 2, 1882	18820602sm_55428_1035-24024.xml	Jun 2, 1882	18820602

sm_55428_1035-24975.xml	JAMES FENIMORE COOPER.	Dec 27, 1882sm_55428_1035-24975.xml	JAMES FENIMORE COOPER.	Dec 27, 1882

sm_55428_1035-25485.xml	CONFLICT BETWEEN INSURGENTS AND LOYAL TROOPS IN SPAIN. SEEKING SAFETY IN FLIGHT TO OTHER COUNTRIES--THE MADRID GARRISON REVIEWED BY THE KING.	Aug 14, 1883sm_55428_1035-25485.xml	CONFLICT BETWEEN INSURGENTS AND LOYAL TROOPS IN SPAIN. SEEKING SAFETY IN FLIGHT TO OTHER COUNTRIES--THE MADRID GARRISON REVIEWED BY THE KING.	Aug 14, 1883

sm_55428_1035-25608.xml	THE CHIEF DEMOCRATIC OFFICEGRABBING MEASURE.	May 1

In [14]:
len(files)

1099

1099

In [15]:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

In [16]:
years = {}

In [17]:
for file in files:
    for month in months:
        try:
            if month in files[file][1]:
                year = files[file][1].split(", ")[1]
                break
            if month in files[file][2]:
                year = files[file][2].split(", ")[1]
                break
        except IndexError:
            continue
    if year in years:
        years[year] += 1
    else:
        years[year] = 1
    print(file + "\t" + year + "\n")

sm_55428_1120-10057.xml	1921
sm_55428_1120-10057.xml	1921


sm_55428_1120-10065.xml	1917
sm_55428_1120-10065.xml	1917


sm_55428_1120-10067.xml	1921
sm_55428_1120-10067.xml	1921


sm_55428_1120-1021.xml	1917
sm_55428_1120-1021.xml	1917


sm_55428_1120-10250.xml	1917
sm_55428_1120-10250.xml	1917


sm_55428_1120-10299.xml	1922
sm_55428_1120-10299.xml	1922


sm_55428_1120-10735.xml	1917
sm_55428_1120-10735.xml	1917


sm_55428_1120-11197.xml	1917
sm_55428_1120-11197.xml	1917


sm_55428_1120-11240.xml	1922
sm_55428_1120-11240.xml	1922


sm_55428_1120-11882.xml	1922
sm_55428_1120-11882.xml	1922


sm_55428_1120-11933.xml	1917
sm_55428_1120-11933.xml	1917


sm_55428_1120-12210.xml	1922
sm_55428_1120-12210.xml	1922


sm_55428_1120-12324.xml	1921
sm_55428_1120-12324.xml	1921


sm_55428_1120-12716.xml	1922
sm_55428_1120-12716.xml	1922


sm_55428_1120-12840.xml	1922
sm_55428_1120-12840.xml	1922


sm_55428_1120-1292.xml	Says Marshall. THINKS CHANGE PERMANENT Revolution Too Deep-Seated to Yield to R



sm_55428_1120-35635.xml	1921
sm_55428_1120-35635.xml	1921


sm_55428_1120-35734.xml	1921
sm_55428_1120-35734.xml	1921


sm_55428_1120-35826.xml	1917
sm_55428_1120-35826.xml	1917


sm_55428_1120-36137.xml	1922
sm_55428_1120-36137.xml	1922


sm_55428_1120-36187.xml	1922
sm_55428_1120-36187.xml	1922


sm_55428_1120-36448.xml	1921
sm_55428_1120-36448.xml	1921


sm_55428_1120-36479.xml	1921
sm_55428_1120-36479.xml	1921


sm_55428_1120-36576.xml	1922
sm_55428_1120-36576.xml	1922


sm_55428_1120-36862.xml	1922
sm_55428_1120-36862.xml	1922


sm_55428_1120-36900.xml	1922
sm_55428_1120-36900.xml	1922


sm_55428_1120-37038.xml	1917
sm_55428_1120-37038.xml	1917


sm_55428_1120-37106.xml	1922
sm_55428_1120-37106.xml	1922


sm_55428_1120-3721.xml	1921
sm_55428_1120-3721.xml	1921


sm_55428_1120-3733.xml	1922
sm_55428_1120-3733.xml	1922


sm_55428_1120-37565.xml	1922
sm_55428_1120-37565.xml	1922


sm_55428_1120-38061.xml	1918
sm_55428_1120-38061.xml	1918


sm_55428_1120-38336.xml	1917
sm_55428_1120

sm_55428_1021-36781.xml	1875


sm_55428_1021-37093.xml	1875
sm_55428_1021-37093.xml	1875


sm_55428_1021-37624.xml	1875
sm_55428_1021-37624.xml	1875


sm_55428_1021-37635.xml	1875
sm_55428_1021-37635.xml	1875


sm_55428_1021-37861.xml	1876
sm_55428_1021-37861.xml	1876


sm_55428_1021-38017.xml	1878
sm_55428_1021-38017.xml	1878


sm_55428_1021-38074.xml	1879
sm_55428_1021-38074.xml	1879


sm_55428_1021-40610.xml	1877
sm_55428_1021-40610.xml	1877


sm_55428_1021-41642.xml	1877
sm_55428_1021-41642.xml	1877


sm_55428_1021-42451.xml	1874
sm_55428_1021-42451.xml	1874


sm_55428_1021-42777.xml	1878
sm_55428_1021-42777.xml	1878


sm_55428_1021-42901.xml	1879
sm_55428_1021-42901.xml	1879


sm_55428_1021-43496.xml	1873
sm_55428_1021-43496.xml	1873


sm_55428_1021-44485.xml	1874
sm_55428_1021-44485.xml	1874


sm_55428_1021-44624.xml	1877
sm_55428_1021-44624.xml	1877


sm_55428_1021-4505.xml	1875
sm_55428_1021-4505.xml	1875


sm_55428_1021-45907.xml	1875
sm_55428_1021-45907.xml	1875


sm_55428_10

sm_55428_1024-12049.xml	1879


sm_55428_1024-12923.xml	1877
sm_55428_1024-12923.xml	1877


sm_55428_1024-13528.xml	1878
sm_55428_1024-13528.xml	1878


sm_55428_1024-13761.xml	1878
sm_55428_1024-13761.xml	1878


sm_55428_1024-14203.xml	1878
sm_55428_1024-14203.xml	1878


sm_55428_1024-14489.xml	1877
sm_55428_1024-14489.xml	1877


sm_55428_1024-14775.xml	1877
sm_55428_1024-14775.xml	1877


sm_55428_1024-1882.xml	1877
sm_55428_1024-1882.xml	1877


sm_55428_1024-18862.xml	1878
sm_55428_1024-18862.xml	1878


sm_55428_1024-19180.xml	1875
sm_55428_1024-19180.xml	1875


sm_55428_1024-1942.xml	1878
sm_55428_1024-1942.xml	1878


sm_55428_1024-20197.xml	1879
sm_55428_1024-20197.xml	1879


sm_55428_1024-20929.xml	1879
sm_55428_1024-20929.xml	1879


sm_55428_1024-2173.xml	1879
sm_55428_1024-2173.xml	1879


sm_55428_1024-22115.xml	1877
sm_55428_1024-22115.xml	1877


sm_55428_1024-22403.xml	1879
sm_55428_1024-22403.xml	1879


sm_55428_1024-23267.xml	1881
sm_55428_1024-23267.xml	1881


sm_55428_1024-2



sm_55428_1026-26806.xml	1879
sm_55428_1026-26806.xml	1879


sm_55428_1026-26941.xml	1881
sm_55428_1026-26941.xml	1881


sm_55428_1026-27769.xml	1879
sm_55428_1026-27769.xml	1879


sm_55428_1026-28035.xml	1878
sm_55428_1026-28035.xml	1878


sm_55428_1026-28098.xml	1881
sm_55428_1026-28098.xml	1881


sm_55428_1026-29853.xml	1879
sm_55428_1026-29853.xml	1879


sm_55428_1026-2994.xml	1877
sm_55428_1026-2994.xml	1877


sm_55428_1026-30097.xml	1878
sm_55428_1026-30097.xml	1878


sm_55428_1026-30381.xml	1878
sm_55428_1026-30381.xml	1878


sm_55428_1026-30437.xml	1880
sm_55428_1026-30437.xml	1880


sm_55428_1026-3328.xml	1880
sm_55428_1026-3328.xml	1880


sm_55428_1026-34186.xml	1879
sm_55428_1026-34186.xml	1879


sm_55428_1026-3514.xml	1879
sm_55428_1026-3514.xml	1879


sm_55428_1026-35550.xml	1881
sm_55428_1026-35550.xml	1881


sm_55428_1026-36239.xml	1878
sm_55428_1026-36239.xml	1878


sm_55428_1026-36373.xml	1877
sm_55428_1026-36373.xml	1877


sm_55428_1026-36990.xml	1877
sm_55428_1026-3



sm_55428_1028-41396.xml	1877
sm_55428_1028-41396.xml	1877


sm_55428_1028-41591.xml	1883
sm_55428_1028-41591.xml	1883


sm_55428_1028-42193.xml	1878
sm_55428_1028-42193.xml	1878


sm_55428_1028-42246.xml	1882
sm_55428_1028-42246.xml	1882


sm_55428_1028-42546.xml	1882
sm_55428_1028-42546.xml	1882


sm_55428_1028-42823.xml	1882
sm_55428_1028-42823.xml	1882


sm_55428_1028-43494.xml	1883
sm_55428_1028-43494.xml	1883


sm_55428_1028-45356.xml	1881
sm_55428_1028-45356.xml	1881


sm_55428_1028-4546.xml	1880
sm_55428_1028-4546.xml	1880


sm_55428_1028-46316.xml	1881
sm_55428_1028-46316.xml	1881


sm_55428_1028-46804.xml	1882
sm_55428_1028-46804.xml	1882


sm_55428_1028-4814.xml	1882
sm_55428_1028-4814.xml	1882


sm_55428_1028-48338.xml	1883
sm_55428_1028-48338.xml	1883


sm_55428_1028-49683.xml	1881
sm_55428_1028-49683.xml	1881


sm_55428_1028-49798.xml	1880
sm_55428_1028-49798.xml	1880


sm_55428_1028-5386.xml	1880
sm_55428_1028-5386.xml	1880


sm_55428_1028-6508.xml	1882
sm_55428_1028-65



sm_55428_1030-44721.xml	1883
sm_55428_1030-44721.xml	1883


sm_55428_1030-4521.xml	1881
sm_55428_1030-4521.xml	1881


sm_55428_1030-45969.xml	1883
sm_55428_1030-45969.xml	1883


sm_55428_1030-4599.xml	1882
sm_55428_1030-4599.xml	1882


sm_55428_1030-46416.xml	1881
sm_55428_1030-46416.xml	1881


sm_55428_1030-4678.xml	1882
sm_55428_1030-4678.xml	1882


sm_55428_1030-47776.xml	1881
sm_55428_1030-47776.xml	1881


sm_55428_1030-48372.xml	1882
sm_55428_1030-48372.xml	1882


sm_55428_1030-49779.xml	1882
sm_55428_1030-49779.xml	1882


sm_55428_1030-519.xml	1882
sm_55428_1030-519.xml	1882


sm_55428_1030-5604.xml	1883
sm_55428_1030-5604.xml	1883


sm_55428_1030-6284.xml	1882
sm_55428_1030-6284.xml	1882


sm_55428_1030-6518.xml	1879
sm_55428_1030-6518.xml	1879


sm_55428_1030-7325.xml	1882
sm_55428_1030-7325.xml	1882


sm_55428_1030-7848.xml	1882
sm_55428_1030-7848.xml	1882


sm_55428_1030-8068.xml	1879
sm_55428_1030-8068.xml	1879


sm_55428_1030-8483.xml	1882
sm_55428_1030-8483.xml	1882


sm



sm_55428_1032-7197.xml	1881
sm_55428_1032-7197.xml	1881


sm_55428_1032-7258.xml	1883
sm_55428_1032-7258.xml	1883


sm_55428_1032-7488.xml	1883
sm_55428_1032-7488.xml	1883


sm_55428_1032-7568.xml	1881
sm_55428_1032-7568.xml	1881


sm_55428_1032-8017.xml	1882
sm_55428_1032-8017.xml	1882


sm_55428_1032-8054.xml	1878
sm_55428_1032-8054.xml	1878


sm_55428_1032-8254.xml	1881
sm_55428_1032-8254.xml	1881


sm_55428_1032-8448.xml	1882
sm_55428_1032-8448.xml	1882


sm_55428_1033-1068.xml	1883
sm_55428_1033-1068.xml	1883


sm_55428_1033-10972.xml	1886
sm_55428_1033-10972.xml	1886


sm_55428_1033-11747.xml	1882
sm_55428_1033-11747.xml	1882


sm_55428_1033-15085.xml	1883
sm_55428_1033-15085.xml	1883


sm_55428_1033-16630.xml	1883
sm_55428_1033-16630.xml	1883


sm_55428_1033-16977.xml	1883
sm_55428_1033-16977.xml	1883


sm_55428_1033-18643.xml	1883
sm_55428_1033-18643.xml	1883


sm_55428_1033-18872.xml	1882
sm_55428_1033-18872.xml	1882


sm_55428_1033-19132.xml	1882
sm_55428_1033-19132.xml	188

sm_55428_1035-4577.xml	1883


sm_55428_1035-46803.xml	1882
sm_55428_1035-46803.xml	1882


sm_55428_1035-47955.xml	1884
sm_55428_1035-47955.xml	1884


sm_55428_1035-48578.xml	1883
sm_55428_1035-48578.xml	1883


sm_55428_1035-49327.xml	1882
sm_55428_1035-49327.xml	1882


sm_55428_1035-5030.xml	1882
sm_55428_1035-5030.xml	1882


sm_55428_1035-6659.xml	1886
sm_55428_1035-6659.xml	1886


sm_55428_1035-6715.xml	1881
sm_55428_1035-6715.xml	1881


sm_55428_1035-7250.xml	1883
sm_55428_1035-7250.xml	1883


sm_55428_1035-8710.xml	1882
sm_55428_1035-8710.xml	1882


sm_55428_1035-929.xml	1884
sm_55428_1035-929.xml	1884


sm_55428_1035-9536.xml	1884
sm_55428_1035-9536.xml	1884




In [18]:
# for line in open("1004-1119.txt"):
#     try:
#         name, date1, date2 = line.split("\t")
#     except ValueError:
#         print(line)
#     for month in months:
#         if month in date1:
#             try:
#                 if date1.split(", ")[1].isdigit():
#                     year = date1.split(", ")[1]
#             except IndexError:
#                 continue
#             break
#         if month in date2:
#             try:
#                 if date2.split(", ")[1].isdigit():
#                     year = date2.split(", ")[1]
#             except IndexError:
#                 continue
#             break
#     if int(year) in years:
#         years[int(year)] += 1
#     else:
#         years[int(year)] = 1
        
#     print(name + "\t" + year)

In [19]:
years

{'1921': 34,
 '1917': 72,
 '1922': 87,
 'Says Marshall. THINKS CHANGE PERMANENT Revolution Too Deep-Seated to Yield to Reaction;-Movement May Spread to Germany. Russian Jews Loyal Will Help Develop Russia.': 1,
 '1918': 19,
 '1919': 2,
 "Accused by Sumnar. CALLS WORKS 'LITERATURE' Finds Condemned Volumes Deal With Phases of Present Thought. PUBLISHER TO START SUIT Will Demand Damages": 1,
 'Far Out of Beaten Sea Lanes. CARRIED OWNER': 1,
 'but ThisIs Not Credited. The explosion Terrific. Creates Havoc in District. 3 KILLED': 1,
 'Mellon and Tariff': 1,
 '1920': 1,
 '1876': 40,
 '1874': 21,
 '1878': 83,
 '1877': 69,
 '1871': 1,
 '1879': 112,
 '1873': 8,
 '1875': 35,
 '1880': 99,
 '1881': 113,
 '1882': 125,
 '1883': 106,
 '1884': 41,
 '1885': 18,
 '1886': 7,
 '1887': 1}

{'1921': 34,
 '1917': 72,
 '1922': 87,
 'Says Marshall. THINKS CHANGE PERMANENT Revolution Too Deep-Seated to Yield to Reaction;-Movement May Spread to Germany. Russian Jews Loyal Will Help Develop Russia.': 1,
 '1918': 19,
 '1919': 2,
 "Accused by Sumnar. CALLS WORKS 'LITERATURE' Finds Condemned Volumes Deal With Phases of Present Thought. PUBLISHER TO START SUIT Will Demand Damages": 1,
 'Far Out of Beaten Sea Lanes. CARRIED OWNER': 1,
 'but ThisIs Not Credited. The explosion Terrific. Creates Havoc in District. 3 KILLED': 1,
 'Mellon and Tariff': 1,
 '1920': 1,
 '1876': 40,
 '1874': 21,
 '1878': 83,
 '1877': 69,
 '1871': 1,
 '1879': 112,
 '1873': 8,
 '1875': 35,
 '1880': 99,
 '1881': 113,
 '1882': 125,
 '1883': 106,
 '1884': 41,
 '1885': 18,
 '1886': 7,
 '1887': 1}

In [20]:
# sorted keys
for key, value in sorted(years.items(), key=lambda x: x[0]): 
    print("{} : {}".format(key, value))

1871 : 11871 : 1

1873 : 81873 : 8

1874 : 211874 : 21

1875 : 351875 : 35

1876 : 401876 : 40

1877 : 691877 : 69

1878 : 831878 : 83

1879 : 1121879 : 112

1880 : 991880 : 99

1881 : 1131881 : 113

1882 : 1251882 : 125

1883 : 1061883 : 106

1884 : 411884 : 41

1885 : 181885 : 18

1886 : 71886 : 7

1887 : 11887 : 1

1917 : 721917 : 72

1918 : 191918 : 19

1919 : 2
1920 : 1
1921 : 34
1922 : 87
Accused by Sumnar. CALLS WORKS 'LITERATURE' Finds Condemned Volumes Deal With Phases of Present Thought. PUBLISHER TO START SUIT Will Demand Damages : 1
Far Out of Beaten Sea Lanes. CARRIED OWNER : 1
Mellon and Tariff : 1
Says Marshall. THINKS CHANGE PERMANENT Revolution Too Deep-Seated to Yield to Reaction;-Movement May Spread to Germany. Russian Jews Loyal Will Help Develop Russia. : 1
but ThisIs Not Credited. The explosion Terrific. Creates Havoc in District. 3 KILLED : 1


In [21]:
# sorted values
for key, value in sorted(years.items(), key=lambda x: x[1]): 
    print("{} : {}".format(key, value))

Says Marshall. THINKS CHANGE PERMANENT Revolution Too Deep-Seated to Yield to Reaction;-Movement May Spread to Germany. Russian Jews Loyal Will Help Develop Russia. : 1
Accused by Sumnar. CALLS WORKS 'LITERATURE' Finds Condemned Volumes Deal With Phases of Present Thought. PUBLISHER TO START SUIT Will Demand Damages : 1
Far Out of Beaten Sea Lanes. CARRIED OWNER : 1
but ThisIs Not Credited. The explosion Terrific. Creates Havoc in District. 3 KILLED : 1
Mellon and Tariff : 1
1920 : 1
1871 : 1
1887 : 1
1919 : 2
1886 : 7
1873 : 8
1885 : 18
1918 : 19
1874 : 21
1921 : 34
1875 : 35
1876 : 40
1884 : 41
1877 : 69
1917 : 72
1878 : 83
1922 : 87
1880 : 99
1883 : 106
1879 : 112
1881 : 113
1882 : 125


In [22]:
t=0
f=0

for file in files:
    if is_censorship(files[file][0]):
        
        t+=1
    else:
        f+=1
print("true = ", t, "false = ", f)

true =  1099 false =  0


In [23]:
# is_censorship(files[""])

In [24]:
len(files)

1099

1099

In [25]:
nlp = spacy.load("en")

In [26]:
# we add some words to the stop word list
for file in files:
    article = []
    doc = nlp(files[file][0].lower())
    for w in doc:
        # if it's not a stop word or punctuation mark, add it to our article!
        if not w.is_stop and not w.is_punct and not w.like_num and w.text != 'I' and not '&' in w.text and not ';' in w.text and not '$' in w.text and len(w.text) > 2 and not ' ' in w.text and not "apos" in w.text and not "quot" in w.text:
            # we add the lematized version of the word
            article.append(w.lemma_)
    files[file].append(article)

In [27]:
# stop word list changelog
# 12/2: added "apos" and "quot" as undesirable remnants of .xml punctuation codings

Now that we have our dirty corpus, let us now clean it.

## Clean Corpus

We will clean it by removing stop words, lemmatizing, removing punctuation, numbers, spaces, bullshit.

In [28]:
# iterate through corpus, clean code

In [29]:
from gensim.corpora import Dictionary
import gensim
from gensim.models import CoherenceModel, LdaModel, LsiModel, HdpModel

In [30]:
cleaned_texts = []

In [31]:
for file in files:
    cleaned_texts.append(files[file][3])

In [32]:
bigram = gensim.models.Phrases(cleaned_texts)

In [33]:
texts = [bigram[line] for line in cleaned_texts]



In [34]:
dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

In [35]:
ldamodel = LdaModel(corpus=corpus, num_topics=2, id2word=dictionary)

In [36]:
ldamodel.show_topics()

[(0,
  '0.006*"tho" + 0.004*"say" + 0.004*"man" + 0.003*"lie" + 0.003*"day" + 0.003*"time" + 0.003*"state" + 0.002*"great" + 0.002*"good" + 0.002*"know"'),
 (1,
  '0.008*"tho" + 0.004*"man" + 0.003*"say" + 0.003*"time" + 0.003*"good" + 0.003*"lie" + 0.002*"tlio" + 0.002*"work" + 0.002*"great" + 0.002*"day"')]

[(0,
  '0.006*"tho" + 0.004*"say" + 0.004*"man" + 0.003*"lie" + 0.003*"day" + 0.003*"time" + 0.003*"state" + 0.002*"great" + 0.002*"good" + 0.002*"know"'),
 (1,
  '0.008*"tho" + 0.004*"man" + 0.003*"say" + 0.003*"time" + 0.003*"good" + 0.003*"lie" + 0.002*"tlio" + 0.002*"work" + 0.002*"great" + 0.002*"day"')]

Now that we have our cleaned corpus, we now sort it by month, and by year, and by decade. We can also have one for total corpus.
Now compare!

#### project outline

1. Reduce corpus to only relevant files (note: 'corpus' from now on will refer to this reduced, relevant corpus') which contain a list of keywords (e.g., censorship, ban, suppress; potentially only 'censorship' as relates to semantic/discourse analysis of a particular term and its employment and movement)
2. We will then clean the code, removing all unnecessary information and only keeping important information held in the files. For example, we will keep the FullText section, publication date, possibly the title (and subtitle).
3. Make code that chronologically orders the corpus. There is a numeric publication date that might be easier to follow. Or, for the alpha pub date: Each file is dated in the style of, for example, "May 20, 1892". We need code that can acknowledge these dates as ordered; May is after April and before June, May 20 is before May 21 and after May 19, May 20, 1892 is before May 20, 1893 and after May 20, 1891, etc.)
4. From here, we will process the chronologically-ordered and -recognized corpus in such a way that we can find latent features in the corpus:
    
    a. We can do this for the entire corpus to see if a continued latent theme (or set of themes) appears to see if 'censorship' itself has, from 1857 to 1922, contained a theme we did not expect.
    
    b. We can do this for time segments (which I refer to as the 'episteme model') whereby we segment the dates and THEN find latent features for each individual segment. What are the (unexpected) themes to be found in certain time periods? We can then cross-reference this with secondary/tertiary sources produced by historians to see what is happenening during these periods of time.
        Ex.1 We can segment periods arbitrarily, such as
                Period 1 = 1857-1876
                Period 2 = 1877-1896
                Period 3 = 1897-1916
                Period 4 = 1916-1922
        Ex.2 We can segment periods which align with major events, such as
                Period 1 = Civil War
                Period 2 = Post-Civil War
                Period 3 = World War 1
                Period 4 = Film Industry emerges
        Ex.3 We can have OVERLAPS between time segments rather than cutting them off from one another. 
                Period 1 = Civil War AND Post-Civil War
                Period 2 = Post-Civil War AND World War 1
                Period 3 = World War 1 AND Film Industry emerges
             This could be done, say, in addition to Ex.1 or Ex.2 so that a) through Ex.1 or Ex.2 we can see what changes between periods and b) through Ex.3 we can see what CONTINUES between those segments.
     c. 