In the terminal, before starting:

```
pip install wikipedia
pip install --force-reinstall openpyxl
```


## Libraries

Let's start with the basic dependencies.

In [None]:
import pandas as pd
import numpy
import wikipedia

## Get the data

Now import the spreadsheet. The code is based on https://alan-turing-institute.github.io/TuringDataStories-fastpages/data%20exploration/data%20wrangling/data%20visualisation/digital%20humanities/2022/03/09/Desert-Island-Discs.html

I'm using the version of the spreadsheet that includes Wikipedia links, so I don't have to take care of disambiguation on my own. This means the most recent episodes are not included. 

In [87]:
hc_data_url = (
    "https://hcommons.org/deposits/download/hc:42512/CONTENT/desert_island_discs.xlsx"
 )

castaways_df = pd.read_excel(
    hc_data_url,
    sheet_name="castaways",
    parse_dates=["date", "date_of_birth"],
    index_col=0
)
castaways_df["year"] = [i.year for i in castaways_df["date"].dt.date]
castaways_df.head(1)

Unnamed: 0,episode_ref,date,castaway_ref,name,std_name,gender,profession,country_of_citizenship,place_of_birth,country_of_birth,date_of_birth,favTrack,luxury,book,wiki_link,link,year
0,2828,1942-01-29,2855,Vic Oliver,vic oliver,male,"['actor', 'performing artist', 'artist', 'prof...",Austria-Hungary,Vienna,Austria,1898-07-08,NA by NA,,,https://en.wikipedia.org/wiki/Vic_Oliver,https://www.bbc.co.uk/programmes/p009y0nq,1942


# Get the Wikipedia Categories

For each one, let's open the Wikipedia article and make a big list of categories.

In [88]:
castaways_df['wiki_id'] = castaways_df['wiki_link'].replace('https://en.wikipedia.org/wiki/', '', regex=True)
castaways_df.head(1)

Unnamed: 0,episode_ref,date,castaway_ref,name,std_name,gender,profession,country_of_citizenship,place_of_birth,country_of_birth,date_of_birth,favTrack,luxury,book,wiki_link,link,year,wiki_id
0,2828,1942-01-29,2855,Vic Oliver,vic oliver,male,"['actor', 'performing artist', 'artist', 'prof...",Austria-Hungary,Vienna,Austria,1898-07-08,NA by NA,,,https://en.wikipedia.org/wiki/Vic_Oliver,https://www.bbc.co.uk/programmes/p009y0nq,1942,Vic_Oliver


In [89]:
all_categories = set()
for page_id in castaways_df['wiki_id']:
    try:
        if pd.isna(page_id):
            continue
        page = wikipedia.page(page_id, auto_suggest = False)
        all_categories.update(page.categories)
    except Exception as e:
        continue



  lis = BeautifulSoup(html).find_all('li')


In [90]:
pd.Series(list(all_categories)).sort_values().to_csv("categories.csv")

Next, I manually choose the categories that should select for disabled interviewees.

More on the thinking here in the blog post, but I end up with `included_categories.csv`.

## Get the Matching People

First, let's import the included categories:

In [91]:
included_categories_csv = pd.read_csv("included_categories.csv")
included_categories = set(included_categories_csv["category"])

In [92]:
disability_column = []
for page_id in castaways_df['wiki_id']:
    try:
        if pd.isna(page_id):
            disability_column.append(None)
            continue
        page = wikipedia.page(page_id, auto_suggest = False)
        if included_categories.intersection(set(page.categories)):
            print(page.title)
            disability_column.append(True)
        else:
            disability_column.append(False)
        
    except Exception as e:
        disability_column.append(None)
        continue

Eric Sykes




  lis = BeautifulSoup(html).find_all('li')


Wee Georgie Wood
George Shearing
Michael Flanders
David Hockney
Michael Ansell
Barry Humphries
Itzhak Perlman
David Scott Blackhall
Tristan Jones
Richard Leakey
Douglas Bader
Rosemary Sutcliff
Christopher Reeve
Ved Mehta
Anthony Hopkins
Bobby Robson
Stephanie Beacham
David Blunkett
Sue Townsend
Stephen Hawking
Evelyn Glennie
Jack Ashley, Baron Ashley of Stoke
Oliver Sacks
John Wilson (blind activist)
Gordon Brown
Eric Sykes
Ian Dury
Tanni Grey-Thompson
Bobby Robson
Frank Gardner (journalist)
Thomas Quasthoff
Barry Humphries
Jane Campbell, Baroness Campbell of Surbiton
Ade Adepitan
Chris Packham
Yinka Shonibare
Warwick Davis
Philip Craven


In [93]:
castaways_df["disability_status"] = disability_column

In [94]:
disabled_castaways = castaways_df[castaways_df["disability_status"] == True]
disabled_castaways.head()

Unnamed: 0,episode_ref,date,castaway_ref,name,std_name,gender,profession,country_of_citizenship,place_of_birth,country_of_birth,date_of_birth,favTrack,luxury,book,wiki_link,link,year,wiki_id,disability_status
362,2954,1957-12-09,852,Eric Sykes,eric sykes,male,"['screenwriter', 'writer', 'author', 'creator'...",United Kingdom,Oldham,United Kingdom,1923-05-04,NA by NA,,,https://en.wikipedia.org/wiki/Eric_Sykes,https://www.bbc.co.uk/programmes/p009y8l4,1957,Eric_Sykes,True
562,2608,1961-11-06,2893,Wee Georgie Wood,wee georgie wood,male,"['comedian', 'actor', 'performing artist', 'ar...",,,,1894-12-17,Private Lives Act 1 by Noel Coward & Gertrude ...,Writing materials,Memoirs of a Midget - Walter de la Mare,https://en.wikipedia.org/wiki/Wee_Georgie_Wood,https://www.bbc.co.uk/programmes/p009y6ck,1961,Wee_Georgie_Wood,True
613,2558,1962-11-05,985,George Shearing,george shearing,male,"['composer', 'musician', 'artist', 'profession...",United Kingdom,London,Roman Empire,1919-08-13,Cantata No. 54: Widerstehe doch der S√ºnde by ...,Metal construction set,Masterman Ready - Frederick Marryat,https://en.wikipedia.org/wiki/George_Shearing,https://www.bbc.co.uk/programmes/p009y5t1,1962,George_Shearing,True
769,2396,1965-12-27,1850,Michael Flanders,michael flanders,male,"['actor', 'performing artist', 'artist', 'prof...",United Kingdom,London,Roman Empire,1922-03-01,Sanctus / Agnus Dei from Missa Criolla by Arie...,Horn,Blank book and pencils,https://en.wikipedia.org/wiki/Michael_Flanders,https://www.bbc.co.uk/programmes/p009y3r8,1965,Michael_Flanders,True
1082,2080,1972-02-07,603,David Hockney,david hockney,male,"['painter', 'visual artist', 'artist', 'profes...",England,Bradford,United Kingdom,1937-07-09,Liebestod (from Tristan und Isolde) by NA,"Paper, pencils and a battery-operated sharpener",Route 69 - Floyd Carter,https://en.wikipedia.org/wiki/David_Hockney,https://www.bbc.co.uk/programmes/p009nc50,1972,David_Hockney,True


In [95]:
disabled_castaways.to_csv("disabled_castaways.csv")

Finally, let's get the core data into Markdown ready to use in the blog. 

I'll edit it manually for readability before publishing. 

In [96]:
for i in disabled_castaways.index:
    castaway_id = disabled_castaways.loc[i]["wiki_id"]
    page = wikipedia.page(castaway_id, auto_suggest = False)
    episode_url = disabled_castaways.loc[i]["link"]
    print("- **{}** ({}) - ".format(page.title, disabled_castaways.loc[i]["year"]))
    print(wikipedia.summary(castaway_id, sentences = 1, auto_suggest = False))
    print("[{} on Wikipedia]({}), [{} on Desert Island Discs]({}), ".format(page.title, page.url, page.title, episode_url))


- **Eric Sykes** (1957) - 
Eric Sykes  (4 May 1923 – 4 July 2012) was an English radio, stage, television and film writer, comedian, actor, and director whose performing career spanned more than 50 years.
[Eric Sykes on Wikipedia](https://en.wikipedia.org/wiki/Eric_Sykes), [Eric Sykes on Desert Island Discs](https://www.bbc.co.uk/programmes/p009y8l4), 
- **Wee Georgie Wood** (1961) - 
George Wood Bamlett OBE (17 December 1895 – 19 February 1979), known professionally as Wee Georgie Wood, was a British comic entertainer and actor who appeared in films, plays and music hall revues.
[Wee Georgie Wood on Wikipedia](https://en.wikipedia.org/wiki/Wee_Georgie_Wood), [Wee Georgie Wood on Desert Island Discs](https://www.bbc.co.uk/programmes/p009y6ck), 
- **George Shearing** (1962) - 
Sir George Albert Shearing,  (13 August 1919 –  14 February 2011) was a British jazz pianist who for many years led a popular jazz group that recorded for Discovery Records, MGM Records and Capitol Records.
[Georg