# Counting Survey Responses

Data comes in all shapes and sizes. Extracting useful insights from data often involves preparation and *cleaning*. Usually, data is not directly available in our Python code, instead being stored in other files or databases. One common filetype for storing data is called **JSON** (Javascript Object Notation). Python comes with built-in libraries for reading these `.json` files, which we can use to further analyse the data therein.

Let's say a colleague recently undertook a survey on Generative AI chatbot use by students in their department. The survey software they used asked the following question and invited responses via the multi-checkbox seen in the image below:

<center><img src="../Resources/survey_checkbox.png" style="height:300px" /></center>

The survey software captured fifty responses in total and the researcher exported the survey data to a JSON file. A short snippet of the JSON output can be seen in the image below.

<center><img src="../Resources/survey_responses.png" style="height:300px" /></center>

Each response is captured as a list, so:

- The first respondent selected **ChatGPT and Claude**
- The second selected only **ChatGPT**
- The third selected **Claude, Gemini and ChatGPT**

It could be argued that this was not the best way to capture data for further analysis, but unfortunately, we have to work with what we have. The researcher would like to determine **how many of the respondents are using ChatGPT**.

In this exercise, you will write code to load and clean the data, and answer the researcher's question.

**The data for this exercise is `survey_responses.json`, located in the `Data` directory within `1_Monday` directory.**

## Breaking the task down to steps

When planning to tackle a task—large or small—it is a good idea to first outline the structure of the code (without actually coding). At this stage we might think about how best to approach the problem, and think about breaking it down into smaller problems.

This notebook divides the development process into three tasks. Each task description includes the necessary information to proceed with the coding. This is an example approach to tackling this problem.

**Task 1** - Load the data from the JSON file into a Python data structure.

**Task 2** - Clean the data to prepare it for analysis.

**Task 3** - Analyse the cleaned data to determine the most selected tool.

## Task 1 - Data Loading

* The first step to load data is to determine where our data is located. In this exercise, we were told above:

    > **The data for this exercise is `survey_responses.json`, located in the `Data` directory within the `1_Monday` directory.**

    Using the *Explorer* in the left sidebar, see if you can find the `survey_responses.json` file now.

    Did you find it? Notice the `Data` directory is located in the same directory as the Notebook we are currently working in. Because of this, we would say, that *relative* to this Notebook, the file we are interested in loading is located in a directory called `Data`. To indicate that we are looking into a directory, we use a forward slash `/` - **note** that this is true for the Codespace and MacOS, but know that Windows machines usually use a backslash!.

    With this in mind, we can write out the *relative path* to the file we are interested in. We might create a *constant* to store the path - by convention, we use all uppercase lettering to define constants.

    ```python
    PATH_TO_FILE = "Data/survey_repsonses.json"
    ```

* Once we know the path to our file, we can load the file into Python. The most common way to do this is using the `with` statement and the `open` function (which come together into a *context manager*). The `as` clause of the `with` statement stores the file object into a temporary variable (here called `file`), but only until the end of the indented block.

    ```python
    with open(PATH_TO_FILE) as file:
        print(file) # for example, or do something else with the file
    ```

* Once we know how to load the file into Python, let's consider how to read the JSON content within. For this, Python includes a *builtin* library called `json`. To use the library, we would first have to import it into our Notebook.

    ```python
    import json
    ```

    Then we can use the `load` function of the library to read JSON data into a variable.

    ```python
    json.load(file)
    ```

### Coding

Can you bring together the approaches above to load the content of a JSON file into a Python variable?

Write your code in the cell below:

In [None]:
import json

PATH_TO_FILE = "Data/survey_responses.json"

with open(PATH_TO_FILE) as file:
    survey_responses = json.load(file)

print(survey_responses)

[['ChatGPT', 'Claude'], ['ChatGPT'], ['Claude', 'Gemini', 'ChatGPT'], ['Copilot', 'ChatGPT'], ['ChatGPT', 'Grok'], ['Claude', 'ChatGPT', 'Gemini', 'Copilot'], ['Gemini'], ['ChatGPT', 'Claude', 'Copilot'], ['Claude', 'Grok'], ['ChatGPT', 'Gemini'], ['Copilot'], ['ChatGPT', 'Claude', 'Gemini', 'Copilot', 'Grok'], ['Claude', 'ChatGPT'], ['Gemini', 'Grok'], ['ChatGPT'], ['Claude', 'Copilot', 'Gemini'], ['ChatGPT', 'Grok'], ['Claude'], ['ChatGPT', 'Copilot'], ['Gemini', 'ChatGPT'], ['Claude', 'Grok', 'Gemini'], ['ChatGPT', 'Copilot', 'Claude'], ['Grok'], ['ChatGPT', 'Gemini', 'Grok'], ['Claude', 'Copilot'], ['ChatGPT'], ['Gemini', 'Claude', 'ChatGPT', 'Grok'], ['Copilot', 'Grok'], ['ChatGPT', 'Claude', 'Gemini'], ['Claude', 'Copilot', 'Grok'], ['ChatGPT', 'Gemini'], ['Grok', 'Claude'], ['ChatGPT', 'Copilot', 'Gemini', 'Grok'], ['Claude'], ['ChatGPT', 'Grok', 'Copilot'], ['Gemini', 'Claude'], ['ChatGPT'], ['Claude', 'Gemini', 'Grok', 'Copilot'], ['ChatGPT', 'Claude'], ['Copilot', 'Gemini'], 

Check your work - if your print the new variable you've created, the output should look similar to this:

```bash
[['ChatGPT', 'Claude'], ['ChatGPT'], ['Claude', 'Gemini', 'ChatGPT'], ['Copilot', 'ChatGPT'], ...
```

For each of the exercises in this notebook, sample solutions can be found in [```Sample Solutions/Sample Solutions 2 - Survey Responses.ipynb```](Sample%20Solutions/Sample%20Solutions%202%20-%20Survey%20Responses.ipynb).

## Task 2 - Data Cleaning

Notice that the data you have loaded is a list, which itself contains other lists. You could write code to start counting the `"ChatGPT"` elements within each *inner* list, but this would add additional complexity. A better approach might be to *flatten* the current structure into a single list before starting to count.

### Coding

Using loops and lists, can you create a new flat list which contains all the elements of the *inner* lists as individual elements? Use the code cell below.

In [None]:
flatten_survey = []
for mini_list in survey_responses:
    for response in mini_list:
        flatten_survey.append(response)

print(flatten_survey)

['ChatGPT', 'Claude', 'ChatGPT', 'Claude', 'Gemini', 'ChatGPT', 'Copilot', 'ChatGPT', 'ChatGPT', 'Grok', 'Claude', 'ChatGPT', 'Gemini', 'Copilot', 'Gemini', 'ChatGPT', 'Claude', 'Copilot', 'Claude', 'Grok', 'ChatGPT', 'Gemini', 'Copilot', 'ChatGPT', 'Claude', 'Gemini', 'Copilot', 'Grok', 'Claude', 'ChatGPT', 'Gemini', 'Grok', 'ChatGPT', 'Claude', 'Copilot', 'Gemini', 'ChatGPT', 'Grok', 'Claude', 'ChatGPT', 'Copilot', 'Gemini', 'ChatGPT', 'Claude', 'Grok', 'Gemini', 'ChatGPT', 'Copilot', 'Claude', 'Grok', 'ChatGPT', 'Gemini', 'Grok', 'Claude', 'Copilot', 'ChatGPT', 'Gemini', 'Claude', 'ChatGPT', 'Grok', 'Copilot', 'Grok', 'ChatGPT', 'Claude', 'Gemini', 'Claude', 'Copilot', 'Grok', 'ChatGPT', 'Gemini', 'Grok', 'Claude', 'ChatGPT', 'Copilot', 'Gemini', 'Grok', 'Claude', 'ChatGPT', 'Grok', 'Copilot', 'Gemini', 'Claude', 'ChatGPT', 'Claude', 'Gemini', 'Grok', 'Copilot', 'ChatGPT', 'Claude', 'Copilot', 'Gemini', 'ChatGPT', 'Grok', 'Claude', 'ChatGPT', 'Copilot', 'Gemini', 'ChatGPT', 'Claude'

## Task 3 - Data Analysis

Now that we have clean data, we can more easily answer our researcher's question:

- How many of the respondents are using ChatGPT?

### Coding

Using the *accumulator pattern*, can you answer the researcher's question?

In [None]:
count = 0
for response in flatten_survey:
    if response == 'ChatGPT':
        count += 1

print(count)

30


## Task 4 - Reusability

Great! We've answered the question. The code above works well to count the number of appearances of **ChatGPT** in the survey responses. But what if our researcher wants to check the number of student respondents that used **Claude**?

What feature of Python could you use to make your code above reusable? 

### Coding

Use the code cell below to code up a function that could be used to count the occurrences of any model in any list.

In [None]:
def count_response(name, list):
    count = 0
    for response in list:
        if response == name:
            count += 1
    return count

print(count_response("ChatGPT", flatten_survey))
print(count_response("Claude", flatten_survey))

30
26


## Extension - Counts Challenge

Now our researcher wants to know **how many respondents** selected **each number of options**. How many respondents chose 0, 1, 2, 3, 4, and 5 options?

For example, consider the responses below.
```bash
[['ChatGPT', 'Claude'], ['ChatGPT'], ['Claude', 'Gemini', 'ChatGPT'], ['Copilot', 'ChatGPT'], ...
```
In this example, the *option counts* are:

```plain
0 options: 0
1 options: 1
2 options: 2
3 options: 1
```

Looking at the original dataset, determine and display all the *option counts*.


### Coding

Use the code cell below to answer the question.

In [None]:
options = [0, 0, 0, 0, 0, 0]
for mini_list in survey_responses:
    if len(mini_list) == 0:
        options[0] += 1
    elif len(mini_list) == 1:
        options[1] += 1
    elif len(mini_list) == 2:
        options[2] += 1
    elif len(mini_list) == 3:
        options[3] += 1
    elif len(mini_list) == 4:
        options[4] += 1
    else:
        options[5] += 1
        
print("0 options: ", options[0])
print("1 options: ", options[1])
print("2 options: ", options[2])
print("3 options: ", options[3])
print("4 options: ", options[4])
print("5 options: ", options[5])

0 options:  0
1 options:  12
2 options:  19
3 options:  12
4 options:  5
5 options:  2


## Extension - Pairs Challenge

Finally, our researcher wants to know which tools are often **selected together** in the a response.

For example, consider the responses below.
```bash
[['ChatGPT', 'Claude'], ['ChatGPT'], ['Claude', 'Gemini', 'ChatGPT'], ['Copilot', 'ChatGPT'], ...
```
In this example, the *pair counts* are:

```plain
(ChatGPT, Claude): 2
(ChatGPT, Gemini): 1
(ChatGPT, Copilot): 1
```

Looking at the original dataset, which **pairs of tools** appear together most frequently? Consider displaying all the *pair counts* to show this.

### Coding

Use the code cell below to answer the question.

In [40]:
def count_pairs(pair, list):
    pair_count = 0
    for mini_list in list:
        if pair[0] in mini_list and pair[1] in mini_list:
            pair_count += 1

    return pair_count

models = ['ChatGPT', 'Claude', 'Copilot', 'Gemini', 'Grok']

for i in range(len(models)):
    for j in range(i, len(models)):
        if i != j:
            print(models[i],"and", models[j], "appeared together", count_pairs([models[i], models[j]],survey_responses), "times")

ChatGPT and Claude appeared together 14 times
ChatGPT and Copilot appeared together 10 times
ChatGPT and Gemini appeared together 13 times
ChatGPT and Grok appeared together 10 times
Claude and Copilot appeared together 11 times
Claude and Gemini appeared together 13 times
Claude and Grok appeared together 10 times
Copilot and Gemini appeared together 7 times
Copilot and Grok appeared together 8 times
Gemini and Grok appeared together 9 times
