Skip to content

Living-with-machines/dhoxss-text2tech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DHOxSS - Text to Tech

Materials for the Text to Tech workshop at the Digital Humanities Oxford Summer School

Binder & Colab

The workshop will mostly rely on Binder for the hands-on activities.

Some notebooks run you can on Google Colab for computational reasons:

Colab

Day 1

  • Intro to Python (a) Open In Colab
  • Intro to Python (b) Open In Colab
  • Functions Open In Colab
  • Opening Files Open In Colab

Day 2

  • Basic Text Processing Open In Colab

  • List, sets and tules Open In Colab

  • Regular Expressions Open In Colab

  • Text Processing Exercises Open In Colab

Day 3

  • Dictionaries and JSON Open In Colab

  • Data Structures Exercises Open In Colab

  • Libraries Open In Colab

  • Working with tabular data Open In Colab

  • Working with XML Open In Colab

### Day 5

LM and Word2Vec Presentation

Link to slides

Word2Vec Notebooks

Note: To run notebooks on Colab you need to install some required libraries. For example, to run the Word2Vec notebooks, add a cell with these commands:

!pip install gensim spacy
  • Exploring Word2Vec Open In Colab
  • Training a Word2Vec model Open In Colab
  • Visualizing Word2Vec vector spaces Open In Colab

Language Models Notebooks

  • Pretrained Language Models: GPT-2 and BERT Open In Colab
  • Large Language Models: BLOOM and ChatGPT Open In Colab

Local installation

  • Install Anaconda
  • Download the content of this repository and unzip
  • Open Anaconda Navigator
  • From Anaconda, create environment py39
  • Install JupyterLab in environment
  • Launch JupyterLab
  • Open terminal in Jupyter Lab
  • Write the following in the terminal, step-by-step:
    • conda activate py39
    • Update pip: pip install --upgrade pip
    • Change directory using the cd command in the terminal until you are in the course folder. There you should run: pip install -r requirements.txt
    • Add the environment to Jupyter (following instructions from here) or by running ipython kernel install --user --name=py39 Then you can already start using the notebooks: select as kernel py39 (restart JupyterLab if the correct kernel does not show)

You find more detailed instructions here.

Data

Datasets used:

  • The Living Machines atypical animacy dataset, freely available here.

  • MuSe: The Musical Sentiment Dataset Muse

  • A historical dataset on popular baby names in the United States from 1880 onwards. Available here.

  • A sample of British Library 19th Century Books collected from here.

  • A sample of British Newspapers articles, digitized by Heritage Made Digital.

Background reading (optional):

Advanced reading list (optional):

Other Resources

This course is based upon many previous resources. Apart from the ones above:

About

Materials for the Text to Tech workshop at the Digital Humanities Oxford Summer School

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published