# The contours of the JISC Corpus

As the JISC corpus is not readily available to everyone, we provide a list of the titles.
This notebook explains which newspapers have categorized as belonging to JISC Corpus and to which entry in Michell's they are associated..

In [1]:
from pathlib import Path
import pandas as pd
import pickle

## Load data

In [2]:
path = Path('../data/Press_Directories_1846_1920_JISC_final.csv')
df = pd.read_csv(path,index_col=0)

In [3]:
jisc_meta = pd.read_excel('../data/JISC_TitleList.xlsx', sheet_name='Titles')
jisc_meta.head(2)

Unnamed: 0,Newspaper Title,System ID,NLP,CATEGORY,JISC,Normalised Title,Abbr,Start_day,Start_month,Start_year,End_day,End_month,End_year
0,Aberdeen Journal and general advertiser for th...,13921360,31,scottish,JISC1,Aberdeen Journal,ANJO,1,Jan,1800,23,Aug,1876
1,Aberdeen Weekly Journal and general advertiser...,13921362,32,scottish,JISC1,Aberdeen Journal,ANJO,30,Aug,1876,31,Dec,1900


# Select Title

For the paper we only looked at provincial (in the sense of non-Metropolitan titles) after 1846 (when the first edition of Mitchell's appeared.)

In [4]:
jsp = jisc_meta[jisc_meta.CATEGORY.isin(['scottish','welsh','provincial','irish']) & (jisc_meta.End_year >= 1846)]
list(jsp['Newspaper Title'])

['Aberdeen Journal and general advertiser for the north of Scotland, The',
 'Aberdeen Weekly Journal and general advertiser for the north of Scotland',
 'Baner Cymru',
 'Baner ac Amserau Cymru',
 'Bath Chronicle, The',
 'Belfast News-Letter',
 'Birmingham Daily Post',
 'Blackburn Standard, The',
 'Blackburn Standard: Darwen Observer, and North-East Lancashire Advertiser, The',
 'Blackburn Standard and Weekly Express, The',
 'Weekly Standard and Express, The',
 'Bristol Mercury',
 'Bristol Mercury and Daily Post, the',
 'Bury and Norwich Post',
 'Caledonian Mercury',
 'Caledonian Mercury and Daily Express, The',
 'Caledonian Mercury, The',
 'Cheshire Observer and General Advertiser: for Cheshire and North Wales',
 'Cheshire Observer and Chester, Birkenhead and North Wales Times',
 'Cheshire Observer',
 'North-Eastern Daily Gazette (Middlesbrough), The',
 'The Evening Gazette for Middlesbrough, Stockton and District',
 'The Daily Gazette for Middlesbrough, Stockton and District',
 'Derby

Below we list the titles that were categorized as being in JISC.

In [5]:
sorted(df[df.IN_JISC > 0].TITLE.unique())

[' HUDDERSFIELDCHRONICLE .',
 ' NEWCASTLE WEEKLY COURANT AND NORTH OF ENGILAND FARMER .',
 ' NORTH-EASTERN DAILY GAZETTE .',
 ' NORTH-EASTERN DAILY Gazette .',
 'ABERDEEN JOURNAL .',
 'ABERDEEN JOURNAL AND CENTRAL ADVERTISER FOR THE NORTH OF SCOTLAND .',
 'ABERDEEN JOURNAL AND GENERAL AD - — — VWERTISER FOR THE NORTH OF SCOTLAND .',
 'ABERDEEN JOURNAL AND GENERAL ADVER - TISER FOR THE NORTH OF SCOTLAND .',
 'ABERDEEN JOURNAL AND GENTRAL AD - — — VERTISER FOR THE NORTH OF SCOTLAND .',
 'ABERDEEN WEEKLY JOURNAL .',
 'ABERDEEN WEEKLY JOURNAl .',
 'BANER AC AMSERAU CYMRU (',
 'BANER AC AMSERAU CYMRU ( Banner and Wales ) .',
 'BANER AC AMSERAU CYMRU ( LiBERAL and',
 'BANER AC AMSERAU CYMRU ( Wales ) .',
 'BANER AC AMSERAU CYMRU ( YWWoales ) .',
 'BANER AC AMSERAU CYMRU .',
 'BANER AC AMSERAU CYMRU . BANNER AND IIMES OF WALES . )',
 'BANER AC AMSERAU CYMRU . BANNER AND TIMES OF WALES . ) CYMRO .',
 'BATH CHRONICLE .',
 'BATH DAILY CHRONICLE .',
 'BATH EVENING CHRONICLE .',
 'BELFAST NEWS LET

The folder `../data/jisc_links` contain annotations where we manually labeled pairs of titles (JISC and Mitchells) as referring to the same newspaper (labelled as "same") or (labelled as "different"). We then extended the same BL System ID to all other entries with the same `NEWSPAPER ID`. Below we create a table that allows you to compare the JISC title and the corresponding entry in Mitchells.

In [12]:

def get_links(pickle_path):
    year = pickle_path.stem.split('_')[-1]
    annotations = pickle.load(open(pickle_path,'rb'))
    same = [a for a in annotations if a[-1]=='same']
    links = []
    for obs,l in same:
        jisc_title = jisc_meta[jisc_meta['System ID']==obs[1]]['Newspaper Title'].values[0]
        mitchell_title = df[df.id==obs[2]]['TITLE'].values[0]
        chain_titles = df[(df.NEWSPAPER_ID==obs[3]) & (df.YEAR > int(year)) & \
                      (df.YEAR <= jisc_meta[jisc_meta['System ID']==obs[1]]['End_year'].values[0])]['TITLE'].values
        links.append(['manual',obs[1],obs[2],jisc_title,mitchell_title])
        links.extend([['newspaper_id',obs[1],obs[3],jisc_title,title] for title in chain_titles])
    return links

In [13]:
annotation_files = list(Path('../data/jisc_links/').glob('*.pickle'))
links = []
for af in annotation_files:
    links.extend(get_links(af))

jisc_link_df = pd.DataFrame(links,columns=['LINKING_METHOD','BL_SYSTEM_ID',"NPD_ID",'JISC_TITLE','MITCHELL_TITEL'])
jisc_link_df.sort_values(by=['BL_SYSTEM_ID'])
jisc_link_df.to_csv('../data/jisc_links.csv')

In [11]:
print('All done!')

All done!


# Fin.