This tutorial on Text Analysis was developed for CDCS by Dave Elsmore (Edina). This workshop uses the programming language Python to perform some common text analysis tasks. The aim is that through running the pre-supplied code examples and editing them to examine the results you will begin to gain an understand of the techniques used. However, even if much of the code seems unfathomable, it is possible to return to Notable and re-use this Notebook with your own texts.
- N-Grams
- Sentiment Analysis
- Topic Modelling
- Visualizing your results
Inside this repository you are going to find an a Jupyter Notebooks that will allow learning how to perform advanced text analysis and a series of datasets (.txt and .csv files). If you want more information on how to use RegEx (Regular Expression) via Python you can have a look to this module.
If you are part of the University of Edinburgh you can use Noteable the cloud-based computational notebook system which work on your browser from any device.
To get started:
Download the files listed on the right to a location on your computer Make sure you know the location they have been downloaded to (usually your 'Downloads' folder) as you will need to upload them to Noteable. (The filename should end with '.ipynb'. If your computer has appended '.txt' to the end of the file make sure this is removed)
- Open the following link in a new tab: https://noteable.edina.ac.uk/login
- Log in with your EASE credentials
- Under 'Standard Notebook' click 'Start'
- From the Noteable home page, click on the 'Upload' button at the top right of the screen and browse to one of the files you saved earlier to select it.
- Now click the blue 'Upload' button to load it into Noteable
- Once the file has been uploaded, click on the filename to start the Notebook
Python is great for general-purpose programming and is a popular language for scientific computing as well. Installing all of the packages required for this lessons individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Pythonversion 3.x (e.g., Python 3.6 version).
Windows - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for Windows.
-
Double-click the executable and install Python 3 using MOST of the default settings. The only exception is to check the Make Anaconda the default Python option.
macOS - Video tutorial
-
Open anaconda.com/download with your web browser.
-
Download the Python 3 installer for macOS.
-
Install Python 3 using all of the defaults for installation.
To start Jupyter Notebook Open the Anaconda Navigator and Launch Jupyter Notebook
- Download the notebook on your machine
- Go to Upload
- Navigate to where you have download your file
- Select Upload again
- Double click on the uploaded file
Open Google Colab: https://colab.research.google.com If you are not already logged you will be prompted to log-in via gmail
- Go on the GitHub header and copy paste the link to the notebook you want to use and press enter
The Notebook contains paragraphs of explanatory text interspersed with grey cells containg code blocks. To run a code block and see the result:
- Place your cursor within the cell
- Click the 'Run' button on the top menu
- The results of running this code will appear below
- if the results don't appear immediately, check the icon in the browser tab. AN eggtimer icon indicates it is processing the code.
- It is best to follow the Notebook from top to bottom as some code blocks will depend on results from previous cells
- You can edit code blocks yourself and run them to see the results of your changes
To clear the results and run the code again you can use the 'Cell' menu on the top menu bar
- To clear the results of the current cell: Cell > Current Outputs > Clear
- To clear the results of all cells: Cell > All Output > Clear