<a href="https://colab.research.google.com/github/scskalicky/LING-226-vuw/blob/main/09_The_Current.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Te Papa - data from *The Current*

Use this notebook to get access to text answers from [The Current exhibit hosted by Te Papa](https://www.tepapa.govt.nz/discover-collections/read-watch-play/current).

For each of the prompts used as part of the exhibit at Te Papa, people answered how they felt about the question, and then (optionally) typed in a response to that question. Te Papa has made this data available [here](https://catalogue.data.govt.nz/dataset/te-au-the-current-te-papa-nature-debate-public-response-data).

There are however a lot of bad responses, or responses with no text. To make life easier, I have downloaded the data and cleaned it up so that the data only includes comments at least 6 words long.

There is a separate `.txt` file for each question. In the `.txt` file, each line has the numeric rating followed by the comment. The numeric rating and the comment are separated by a tab (`'\t'`).

This notebook shows you how to load the data into Colab. Below, you will find screenshots for each question, followed by cells loading in data associated with that question. If and when you choose to anlayze this data later on, you can copy these cells into another notebook to load the data there. You can of course use other methods for loading the data in should you wish.


## **how the data loading works**

For each question, the notebook first loads in the raw data to the notebook environment. This makes it available to the Colab notebook.

Then, the data is saved to a variable, which makes the data available in the Python environment.

Each text file is in this format:

```
rating1 \t commment1 \n
rating2 \t comment2 \n
...
rating100 \t comment100 \n
...
```

So, each line is separated by a newline character `\n`, and each rating/comment pair is separated by a tab character '\t'.

We can use this information to split the lines from other lines, and also split the ratings from the comments, both using `.split()`.

See below how this is done for the first question of The Current.



# TP001 (Petrol cars should be banned by 2030)

People saw this screen at Te Papa, selected which emotion matched their response, then optionally typed in a comment explaining why they chose a particulat option.

The responses are coded as 0,1,2,3, or 4, and correspond to this image, from left to right (i.e., 0 = excited, 4 = angry):



<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/df21c3ed-3335-4de2-9ebd-0bccf32d8576/download/tp001-screenshot-the-current-topic-question-petrol-cars-should-be-banned-by-2030.png">


These code cells explain how to get the data associated with this question.

First, we can use `!wget` to load in data from a URL. This is actually not a Python function, but instead a way to access the underlying linux/unix server the notebook runs on. The `!` tells the code cell to run it as a terminal command, rather than as a Python command.

Don't worry too much about it if you are unsure what this means, or you can read more here: https://www.geeksforgeeks.org/wget-command-in-linux-unix/

In [None]:
# load the TP001 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp001.txt'

Now that the data is loaded into the notebook environment, we can access it using Python to load it into a variable.

In the cell below, the .txt file is read into memory, any trailing or leading whitespace/newlines are removed, and the text is split on newlines.

In [None]:
# open the text file and split on newlines
# rstrip is to clean the final newline that exists in each file.
tp001 = open('tp001.txt').read().rstrip().split('\n')

Reading in the data this way creates a list of ratings and comments. Looking at the first ten comments, we can see how this works:

In [None]:
# look at the first ten members of the list:
tp001[:10]

We might want only the comments for text analysis (although the ratings will be useful later!). Split the comments on the tab character (`.split('\t')` to get a list of just the comments.

In [None]:
# Create a list of just the comments (splitting on tab separates the rating from comment)
# indexing the second element becomes the comment (using index of [1])
tp001_comments = [comment.split('\t')[1] for comment in tp001]

Now we have a list of different comments made in response to the exhibit!

Remember, I cleaned out comments that were less than 6 words long - doing this removed a lot of low quality answers. But, there are likely still other low quality responses, such as the second response in this list:

In [None]:
# inspect the first ten answers:
tp001_comments[:10]

Now, you may also want the ratings, and to do so you could create an additional list with the ratings, or find some other way to keep them in the same data object, such as creating a list of lists: `[[rating, comment], [rating, comment]...`


Below, you will find screenshots for the other questions, which provide you with the labels for the numeric rating, as well as code cells loading the text in and creating text of the comments.


# TP008 (workplaces should give back one day a month for nature)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/184490d0-89ba-46af-86db-9b1667a0b808/download/tp008-screenshot-the-current-topic-question-nature-helps-us-get-through-lockdowns.png">

In [None]:
# load the TP008 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp008.txt'

In [None]:
# read in the entire file
tp008 = open('tp008.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp008_comments = [comment.split('\t')[1] for comment in tp008]

In [None]:
# look at the first ten comments
tp008_comments[:10]

# TP009 (we should protect kauri, potentially by not entering kauri forests)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/6d100d05-5563-4895-866e-375c3ff6272e/download/tp009-screenshot-the-current-topic-question-we-should-protect-kauri.png">

In [None]:
# load the TP009 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp009.txt'

In [None]:
# read in the entire file
tp009 = open('tp009.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp009_comments = [comment.split('\t')[1] for comment in tp009]

In [None]:
# look at the first ten comments
tp009_comments[:10]

# TP010 (we should not plant myrtles so we can protect pōhutukawa)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/dd42b67f-c7fe-4100-ada7-7cf7dcf24d29/download/tp010-screenshot-of-the-current-topic-question-on-myrtle-rust.png">

In [None]:
# load the TP010 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp010.txt'

In [None]:
# read in the entire file
tp010 = open('tp010.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp010_comments = [comment.split('\t')[1] for comment in tp010]

In [None]:
# look at the first ten comments
tp010_comments[:10]

# TP011 (tourism should be limited to protect NZ environment)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/97145c7c-39ca-4d61-ad8a-4d4a4b99028e/download/tp011-screenshot-of-the-current-topic-question-on-when-our-borders-reopen-we-should-limit-touris.png">

In [None]:
# load the TP011 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp011.txt'

In [None]:
# read in the entire file
tp011 = open('tp011.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp011_comments = [comment.split('\t')[1] for comment in tp011]

In [None]:
# look at the first ten comments
tp011_comments[:10]

# TP012 (gene editing instead of poison should be considered to control wasps)

<img  style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/d8d9f14e-3e32-40b7-86cc-0a659aad3a70/download/tp012-screenshot-the-current-topic-waps-gene-editing.png">

In [None]:
# load the TP012 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp012.txt'

In [None]:
# read in the entire file
tp012 = open('tp012.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp012_comments = [comment.split('\t')[1] for comment in tp012]

In [None]:
# look at the first ten comments
tp012_comments[:10]

# TP002 (supermarkets should only sell sustainably caught fish)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/66f42715-0b29-4cb2-91e3-fd461c991855/download/tp002-screenshot-the-current-topic-sustainably-caught-fish.png">

In [None]:
# load the TP002 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp002.txt'

In [None]:
# read in the entire file
tp002 = open('tp002.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp002_comments = [comment.split('\t')[1] for comment in tp002]

In [None]:
# look at the first ten comments
tp002_comments[:10]

# TP005 (everyone should be allowed to catch as much whitebait as they like)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/28202b85-33d9-4e69-8d9e-9045e58bf602/download/tp005-screenshot-the-current-topic-catch-whitebait.png">

In [None]:
# load the TP005 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp005.txt'

In [None]:
# read in the entire file
tp005 = open('tp005.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp005_comments = [comment.split('\t')[1] for comment in tp005]

In [None]:
# look at the first ten comments
# these ones are particularly...bad?
tp005_comments[:10]

# TP017 (freedom camping should be banned)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/d0c30685-41b4-4492-ad62-f558a29c5bf5/download/tp017-screenshot-the-current-topic-question-freedom-camping.png">

In [None]:
# load the TP017 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp017.txt'

In [None]:
# read in the entire file
tp017 = open('tp017.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp017_comments = [comment.split('\t')[1] for comment in tp017]

In [None]:
# look at the first ten comments
# these ones are particularly...bad?
tp017_comments[:10]

# TP003 (cats should be indoor-only pets)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/973e4b3f-9ae4-4d6f-838b-555451b750f5/download/tp003-screenshot-the-current-topic-question-cats-indoor-only.png">

In [None]:
# load the TP003 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp003.txt'

In [None]:
# read in the entire file
tp003 = open('tp003.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp003_comments = [comment.split('\t')[1] for comment in tp003]

In [None]:
# look at the first ten comments
tp003_comments[:10]

# TP007 (single use plastics should be bannedby 2025)

<img style="margin:auto; max-height:100%; display:block" src="https://catalogue.data.govt.nz/dataset/72ee59f3-cafb-4e3e-b8b7-01094c616216/resource/7280e3c0-d500-4f8f-b960-211caeb4d1ca/download/tp007-screenshot-the-current-topic-single-use-plastics.png">

In [None]:
# load the TP007 data to the notebook environment
!wget 'https://raw.githubusercontent.com/scskalicky/LING-226-vuw/main/the-current/tp007.txt'

In [None]:
# read in the entire file
tp007 = open('tp007.txt').read().rstrip().split('\n')

In [None]:
# extract the comments
tp007_comments = [comment.split('\t')[1] for comment in tp007]

In [None]:
# look at the first ten comments
tp007_comments[:10]