# Challenges for week 1

Now that we've seen how Python and Jupyter Notebooks work and that you have read about Digital Analytics and Computational Social Science, it's time for you to combine apply this knowledge. This week has three challenges. 

Each challenge has three components:
1. **Programming**: Applying one of the programming or data analysis steps in Python you learned in the tutorial
2. **Interpretation**: Explaining what you are doing and interpreting the results of the data analysis in MarkDown 
3. **Reflection**: Connecting these concepts with the literature of the week in a short reflection (*max 300 words*)

**Some important notes for the challenges:**
1. These challenges are a warming up, and help you get ready for class. Make sure to give them a try on all of them. If you get an error message, try to troubleshoot it (using Google often helps). If all else fails, go to the next challenge (but make sure to hand it in).
2. While we of course like when you get all the answers right, the important thing is to exercise and apply the knowledge. So we will still accept challenges that may not be complete, as long as we see enough effort *for each challenge*. The rubric (see Canvas) reflects this.
3. Delivering the challenge on time on Canvas assignment is critical, as it helps also prepare for the DA live session. Check on Canvas how to hand it in.

### Facing issues? 

We are constantly monitoring the issues on the GitHub to help you out. Don't hesitate to log an issue there, explaining well what the problem is, showing the code you are using, and the error message you may be receiving. 

**Important:** We are only monitoring the repository in weekdays, from 9.30 to 17.00. Issues logged after this time will most likely be answered the next day. This means you should now wait for our response before submitting a challenge :-)


## Challenge 1


### Programming challenge

Check out how to add (1) an image and (2) a link in a cell in MarkDown. You may need to search this online. Check how to do it, and include an image and a link in your notebook.


### Interpretation

Write a short text aimed at another student that does not know what MarkDown is, and explain step by step how they can add an image and a link in MarkDown.


### Reflection

You are now working with Jupyter Notebooks and Python code - and submitting your notebook as html via Canvas. Review the article *van Atteveldt et al. (2019)* and explain how these activities relate to open science principles, and connect them to the Computational Science workflow. Motivate your response. 

#### Markdown explained (progamming challenge + interpretation)
Markdown is a way to easily write structured text. It is often used in a readme file in a GitHub repository to explain a project or give additional information to other developers. 

In order to add an image use squared brackets `[]` that contain your link text, followed by the link URL between parentheses `()`. This looks as follows: [This is a link](https://www.instagram.com/reppinbars/)

Adding an image is similar to adding a link. To add an image use a `!`, followed by squared brackets `[]` that contain the image alt text, and finally the image source between parentheses `()`. This looks as follows: 

![Image alt text](attachment:9c436fbf-d1c5-43a9-b86d-cec0ef9c8ae9.png)

#### Reflection
Open science principles are aimed at promoting the reproducibility, validity and transparency of scientific research, as well as sharing data to make it available for a broad range of scholars rather than a select few. Writing code in python, documenting it in Jupyter Notebook, and sharing this via Canvas follow these principles. This is because the documentation that I write will increase the transparency about my work, while it will also contain code and data that can be used to reproduce my results.

This mainly fits into the `dissemination` part of the Computational Science workflow. The dissemination part puts ephasis on writing documentation about scientific work (from the start), using literate programming languages like Python, and publishing work to GitHub upfront.

---

## Challenge 2

### Programming challenge

Below you have a list of visitors again, and in particular their referral. Imagine that this list of referrals to the website of a news organization is provided by Google Analytics. Create a function that categorizes the visitors according to some interesting set of categories.

Please note that a part of the code is already written - with a loop that checks each item in the list. You need to complete the function.

### Interpretation

1. Write a short text aimed at another student that does not know what a function is, and explain step by step what you are doing with the function. 
2. Provide a short justification - aimed at a stakeholder in the news organization - explaining why you decided to categorize the visitors in the categories that you defined.

### Reflection

 Create an overview of the specific arenas, assets, actors and actions within the digital ecosystem that are relevant here, based on the discussion proposed in the *Araujo et al. (2021)* article. Make sure to also use examples from the list to motivate your answer.

In [1]:
visitors = ['campaign utx=9902',
             'instagram app',
             'facebook app',
             'campaign utx=1389',
             'facebook app',
             'google search',
             'newsletter',
             'facebook app',
             'newsletter',
             'campaign utx=9902',
             'instagram app',
             'campaign utx=1389',
             'newsletter',
             'facebook.com',
             'facebook app',
             'newsletter',
             'google search',
             'campaign utx=9902',
             'campaign utx=9902',
             'instagram app',
             'organic',
             'instagram app',
             'news campaign 9902',
             'instagram app',
             'facebook app',
             'facebook.com',
             'campaign utx=9902',
             'campaign utx=9902',
             'campaign utx=1389',
             'campaign utx=1389',
             'facebook.com',
             'news campaign 9902',
             'newsletter',
             'instagram app',
             'instagram app',
             'campaign utx=1389',
             'direct',
             'facebook.com',
             'newsletter',
             'direct',
             'campaign utx=1389',
             'direct',
             'organic',
             'facebook.com',
             'facebook.com',
             'facebook app',
             'campaign utx=9902',
             'google search',
             'campaign utx=9902',
             'campaign utx=1389']

In [2]:
def string_contains_list_item(string, list_items): 
    return any(item in string for item in list_items)
    
def check_visitors(x):
    # Include your code here. Make sure to include something after return, 
    # so the function returns the information needed.
    if string_contains_list_item(x, ['facebook', 'instagram']):
        return 'Social media'
    elif string_contains_list_item(x, ['google']):
        return 'Search engine'
    elif string_contains_list_item(x,['campaign utx=']):
        return 'Advertising'
    elif string_contains_list_item(x, ['organic', 'direct']):
        return 'Organic traffic'
    elif string_contains_list_item(x, ['newsletter']):
        return 'Owned media'
    return 'Other'

In [3]:
# Run this cell to see if your code worked
for visitor in visitors:
    print(check_visitors(visitor))

Advertising
Social media
Social media
Advertising
Social media
Search engine
Owned media
Social media
Owned media
Advertising
Social media
Advertising
Owned media
Social media
Social media
Owned media
Search engine
Advertising
Advertising
Social media
Organic traffic
Social media
Other
Social media
Social media
Social media
Advertising
Advertising
Advertising
Advertising
Social media
Other
Owned media
Social media
Social media
Advertising
Organic traffic
Social media
Owned media
Organic traffic
Advertising
Organic traffic
Organic traffic
Social media
Social media
Social media
Advertising
Search engine
Advertising
Advertising



#### Interpretation
##### 1.  What is a function?
What is a function:** A function is a piece of code that can be reused in multiple instances. It generally accepts arguments, which can be used to perform logic that is stored in the function. This logic eventually should return a result. In my code above I have included two functions. 

The first function called `string_contains_list_item` receives two arguments: a `string` and a `list of items`, which are also strings. It then proceeds to check whether any of the strings in the list of items forms a subset of the main string that is provided in the first argument. It returns any of two values, which are `true` if a list item does match (part of) the string, and `false` if no match is found.

The second function is called `check_visitors` and it receives one argument, which is called `x`. This argument should consist of a string that represents an attribute of a visitor. The function then proceeds to test a list of `if statements`, which check whether a any item from a list of strings are a subset of the string that was provided in the `x` argument. To check this the function calls the priorly defined `string_contains_list_item` function. If a list item forms part of the main string, the function returns the corresponding category.

#### 2. Justification for my categorization
I have chosen to categorize the visitors by the origin their referral. I chose these categories because I think that they provide valuable information to advertisers on the types of platforms/media that generate traffic. For example, I think it is valuable what share of visitors is generated by paid advertisements, and what share is "free" traffic generated by search engines.

## Challenge 3

### Programming

This is the most important challenge of this tutorial, as we will use the concepts here to think how to organize our (digital) data. The paragraph below contains a lot of information. How would you organize this data in a way that you can visualize and work with it later? Consider using lists, dictionaries, tuples... or whatever makes sense to you! 

```
John Smith is 32 years old, is an analyst at Salesforce UK,
and is father to a young boy. He is divorced. 
Mary Smith is 21 years old, currently studies at the
University of Amsterdam, and was born in the US. 
Tom Brokaw is a US journalist, born in 1940, 
married with three children.
```

### Interpretation

Please explain why you decided to organize the data in the way that you did. In this explanation, consider both *technical* aspects (i.e., why lists, dictionaries, tuples or combinations thereof) and *substantive* aspects (i.e., why you structured the data in the way that you did).

### Reflection

In this challenge, you created a set of structured data (a dataset) out of what it was somewhat unstructured data (a piece of text). While doing so, you made choices on how to structure these data. Please briefly reflect on some of the advantages and disadvantages of the approach you took, and what may be some (normative, ethical) implications of your choices.

#### Programming

In [4]:
people = {
    1: {
        'name': 'John Smith',
        'age': 32,
        'occupation': {
            'type': 'Work',
            'organisation': 'Salesforce UK',
            'description': 'Analyst',
        },
        'relationship_status': 'divorced',
        'children_count': 1,
        'country_of_origin': 'unknown',
        'country_of_residence': 'United Kingdom',
    },
     2: {
        'name': 'Mary Smith',
        'age': 21,
        'occupation': {
            'type': 'Study',
            'organisation': 'University of Amsterdam',
            'description': 'Student',
        },
        'relationship_status': 'single',
        'children_count': 0,
        'country_of_origin': 'US',
        'country_of_residence': 'The Netherlands',
    },
     3: {
        'name': 'Tom Brokaw',
        'age': 81,
        'occupation': {
            'type': 'working',
            'organisation': 'unknown',
            'description': 'Journalist',
        },
        'relationship_status': 'married',
        'children_count': 3,
        'country_of_origin': 'US',
        'country_of_residence': 'US',
    },
}

In [5]:
print(people[1]['name'])

John Smith


#### Interpretation
I have chosen to create a variable `people` that consists of a top-level dictionary that is populated with different dictionaries that each represent a person. The dictionary for each person is accessible through a unique id. I have chosen this approach because it allows user data to be retreived using their id, which would not be possible if users were stored in a list. A dictionary also allows for the inclusion of additional people in the dataset in the future, which a tuple would not allow for.

The dictionary for each individual person contains data on different demographic characteristics. Since a person's occupation can be of many different natures, and because it could be valubale to store the organisation that a person is affiliated with as a separate value, I have decided to subdivide occupation into three values, which are stored in a nested dictionary. Given a larger dataset of people, for example, this would allow me to analyze wheter different people are affiliated with the same organisation, and how they are related within the organisation (e.g. student/teacher).

#### Reflection
One advantage of my approach is that it allows for the inclusion of additional people in the dataset in the future, as well as the targeted deletion of individuals. Another advantage, as discussed in the interpretation, is that the way the occupation data is structured allows for more meaningful analyses of larger groups of people. A disadvantage is that personal data is stored in a comprehensive manner.This means that if the data would ever be compromised, this would allow for people that attained acces to retreive sensitive information about the people that are included in the dataset.