**Josh Hellings** - Practical Data Science for Economists 2024

In this notebook, we will explore some Python basics, covering data types, lists, dictionaries, and other techniques that will help kickstart the journey to using Python for Data Science.

<br>
<br>

# Python basics

## 1. Assigning and viewing variables
In Python, we can declare a variable and assign it a value using the assignment operator =.

Variables are like containers that store data values. They allow us to label data with a descriptive name, so our code is easier to read and understand. In Python, we declare a variable and assign it a value like this:

In [1]:
x = 10        # Assign variable x the integer value 10
y = x + 5     # Assign variable y the value of x + 5
z = "dog"     # Assign variable z the string value "dog"

We can print these variables to check their values

In [2]:
print(x)

10


Or we could print all at once

In [3]:
print(x, y, z)

10 15 dog


**Tip**: In Python, we can add comments to the code using `#`. Any text after the hashtag is ignored when running the code, so we can use them to add helpful notes. (Shortcut: CTRL + '/' on Windows, or command + '/' on Mac)

<br>
<br>

## 2. Understanding Data Types

In Python, data types are crucial as they define the operations possible on the values and how they can be stored. Here are some of the basic data types you'll frequently encounter:

- **Integers** (`int`): Whole numbers, positive or negative, without decimals. Useful for counting or indexing. For example, age = 25.

- **Floating Point Numbers** (`float`): Numbers with a decimal point. Perfect for measurements or any calculations requiring precision. For example, inflation = 1.3

- **Strings** (`str`): A sequence of characters, enclosed in single (' ') or double (" ") quotes. Strings are how text is represented. For example, name = "John Doe".

- **Lists** (`list`): An ordered collection of items, which can be of mixed data types. Lists are versatile and can be modified (mutable). These are written as a list with comma separated values (items) between square brackets. For example, fruits = ["apple", "banana", "cherry"].

- **Dictionaries** (`dict`): A collection of key-value pairs. Each key-value pair maps the key to its associated value, making dictionaries perfect for storing data in a structured way, similar to JSON objects from APIs. For example, person = {"name": "John", "age": 25}. When we interact with JSON objects within Python, these are of dictionary type.

**Hint**: Find out more about Python data types [here](https://www.w3schools.com/python/python_datatypes.asp)

Understanding these types is essential because they determine what kind of operations you can perform on the data. For instance, you can add two integers or concatenate strings but trying to mix types without conversion (like adding a string to an integer) will lead to errors.

In [4]:
# Integer example
age = 25
print("Age:", age)

# Floating point number example
temperature = 98.6
print("Temperature:", temperature)

# String example
name = "John Doe"
print("Name:", name)

# List example
locations = ["London", "Darlington",  "Newport"]
print("Locations:", locations)

# Dictionary example
person = {"name": "John", "age": 25}
print("Person:", person)

Age: 25
Temperature: 98.6
Name: John Doe
Locations: ['London', 'Darlington', 'Newport']
Person: {'name': 'John', 'age': 25}


<br>

We can check the date type of any variable using `type()`

In [8]:
type('This is string')

str

In [9]:
type(3.1)

float

In [10]:
type(3)

int

<br>
<br>

## 3. Lists

Lists are the most versatile datatype in Python, allowing you to store a collection of items in a single variable. They can contain items of different types, but typically all items in a list are of the same type.

In [21]:
x = []  # Pair of square brackets creates an empty list
x = ['a']
x = ['a', 'b', 'c']     # Items in a list are separated by commas
x = [1, 'abc', 0.234]   # Lists can contain different data types (but best to stick to one type)

Let's explore how to interact with lists by editing and viewing their items, using a list of locations in Wales as our example.

In [26]:
# Creating a list of locations
locations = ["Swansea", "Cardiff", "Newport", "Wrexham"]
print("Original locations:", locations)

Original locations: ['Swansea', 'Cardiff', 'Newport', 'Wrexham']


<br>

### 3.1 **Accessing List Items by Index**

Each item in a list is assigned an index based on its position, starting with 0. This means the first item has an index of 0, the second an index of 1, and so on. Here's how you can access an item by its index:

In [28]:
print(locations[0])
print(locations[-1])    # What happens when we use a negative index?
print(locations[2])

Swansea
Wrexham
Newport


In [29]:
# Accessing the third item in the list
third_location = locations[2]  # indexing starts at 0!
print("The third location is:", third_location)

The third location is: Newport


**Note**: The rule to remember is that indexing starts at 0. So, the list above has positions 0, 1, 2 and 3. Asking for position 4—which would seem to be Wrexham—will throw an error because it is actually at index 3.

In [31]:
locations[4]

IndexError: list index out of range

**Tip**: You can use negative indexing to access list items in reverse. So `mylist[-1]` will return the last item in the list. `mylist[-2]` will return the 2nd to last item and so on.

<br>

### 3.2 **Modifying List Items**

You can change the value of a list item by accessing it through its index:

In [32]:
# Changing the second location
locations[1] = "Bridgend"
print("Updated locations:", locations)

Updated locations: ['Swansea', 'Bridgend', 'Newport', 'Wrexham']


<br>

### 3.3 **Adding and Removing Items**

Items can be added to the end of a list using the **`.append()`** method, and removed using the **`.remove()`** method:

In [33]:
# Adding a new location
locations.append("Aberystwyth")
print("Locations with Aberystwyth:", locations)

# Removing a location
locations.remove("Swansea")
print("Locations after removing Swansea:", locations)

Locations with Aberystwyth: ['Swansea', 'Bridgend', 'Newport', 'Wrexham', 'Aberystwyth']
Locations after removing Swansea: ['Bridgend', 'Newport', 'Wrexham', 'Aberystwyth']



**Tip**: Python has lots of built-in operations you can use with lists:
- [Python built-in functions](https://www.w3schools.com/python/python_ref_functions.asp)
- [Python list methods](https://www.w3schools.com/python/python_ref_list.asp)

<br>

### 3.4 **Combining lists**

Two lists can be added together to combine them into one list. Lists respond to the "+" and "*" operators, much like strings and numbers.

In [35]:
list1 = [4, 5, 6]
list2 = ['a', 'b', 'c']

new_list = list1 + list2    # Combining list1 and list2 into new_list
print(new_list)

[4, 5, 6, 'a', 'b', 'c']


<br>

### <font color='Green'><strong>List Exercises: </strong></font>

Try completing these exercises on lists.

**EX 1** Try adding a city name to the `locations` list and then remove "Newport". Print the list before and after to check the result.

In [36]:
### 1. Add Solution Here ###



['Bridgend', 'Newport', 'Wrexham', 'Aberystwyth']

**EX 2**  Find the number of items in the `locations` list. Hint: check the linked resources above.



In [37]:
### 2. Add Solution Here ###

**EX 3**  Define a list of only numeric (i.e. type integer or float) values. Then find the sum of all the numbers in that list.

Hint. check the linked resources above.

In [None]:
### 3. Add Solution Here ###

mylist =

**EX 4** (optional) Given the following lists:

In [None]:
device1 = ['iPhone 16', 999]
device2 = ['Galaxy S22', 949]
device3 = ['Pixel 3', 699]

- Add a colour to each of the devices.
- The Galaxy S22 has been updated to the S24, change its name.
- Combine all the devices into a list.

**Example expected output:** [['iPhone 16', 999, 'Grey'], ['Galaxy S24', 949, 'Black'], ['Pixel 3', 699, 'White']]

In [38]:
### 4. Add Solution Here ###


<br>

---

<br>
<br>

## 4. Dictionaries

! **IMPORTANT** !

#### Dictionaries: Python's Equivalent to JSON

In Python, dictionaries (`dict`) are a collection of key-value pairs, just like JSON (JavaScript Object Notation). Vega-Lite, as well as many APIs, use JSON syntax, so understanding how to work with dictionaries in Python will be extremely useful. In fact, JSON objects are essentially the same as Python dictionaries, making it easy to convert between the two.

### 4.1 **What is a Dictionary?**

A dictionary is a data type that stores items as key-value pairs. Each key is unique, and you use the key to access its associated value.

In [2]:
# Example of a dictionary
person = {
    "name": "Alice",
    "age": 30,
    "city": "Newport"
}

print(person)

{'name': 'Alice', 'age': 30, 'city': 'Newport'}


Here, "name", "age", and "city" are the keys, and "Alice", 30, and "Newport" are the corresponding values.

<br>

### 4.2 **Accessing Dictionary Values**

To retrieve a value from a dictionary, you use the key associated with it.

In [3]:
# Accessing values
print(person["name"])  # Outputs: Alice
print(person["age"])   # Outputs: 30

Alice
30


TODO: Test what happens when you try to access a key that doesn't exist.

In [None]:
### Add code here ###


If you try to access a key that doesn't exist, Python will raise a KeyError. To avoid this, you can use the .get() method.

In [4]:
# Using .get() to avoid errors
person.get("job")   # 'job' doesn't exist, so by default nothing is returned.

When using the .get() method, we can customised the default value that is returned if the key is missing.

In [5]:
print(person.get("job", "No job information"))  # Outputs: No job information

No job information


<br>

### 4.3 **Dictionary Methods**

Dictionaries have several useful methods that allow you to work with them efficiently. Just like with lists, we can use call these methods on any dictionary variable that we have defined and stored in the memory. Here are a few useful ones:

- `.keys()`: Returns a view object containing all the keys.
- `.values()`: Returns a view object containing all the values.
- `.items()`: Returns key-value pairs as tuples.

**Hint:** More built-in [dictionary methods](https://www.w3schools.com/python/python_ref_dictionary.asp).

In [6]:
# Getting all keys and values
print(person.keys())    # Outputs: dict_keys(['name', 'age', 'city'])
print(person.values())  # Outputs: dict_values(['Alice', 30, 'Newport'])

dict_keys(['name', 'age', 'city'])
dict_values(['Alice', 30, 'Newport'])


In [9]:
person.items()

dict_items([('name', 'Alice'), ('age', 30), ('city', 'Newport')])

<br>

### 4.4 **Modifying Dictionaries**

You can add new key-value pairs to a dictionary, just as easily as you assign values to regular variables. You can also update existing key-value pairs by accessing the key and assigning a new value to it.

In [11]:
# Show the current `person` dictionary.
print(person)

{'name': 'Alice', 'age': 30, 'city': 'Newport'}


In [15]:
# Adding a new key-value pair
person["job"] = "Data Scientist"
print(person)

{'name': 'Alice', 'age': 31, 'city': 'Newport', 'job': 'Data Scientist'}


Just like assigning any other variable, we reference the dictionary by its name (person) and specify the key we want to add ("job"). Then, we assign a value ("Data Scientist") to that key. If the key doesn't exist, it will be created.

In [16]:
# Updating an existing value
person["age"] = 31
print(person)

{'name': 'Alice', 'age': 31, 'city': 'Newport', 'job': 'Data Scientist'}


To update a value, we reference the dictionary and the existing key ("age") and assign it a new value (31). The original value (30) is overwritten with the new one.

**Note**: The syntax for both adding and updating key-value pairs is the same. The difference is that if the key already exists, the value will be updated; if the key doesn't exist, a new key-value pair will be added.

<br>

### 4.5 **Saving a Dictionary as a JSON File**

Since Python dictionaries can easily be converted to JSON format, we can save them to a file using the `json` package. This will allow us to explore, edit, and export JSON data, which could then be used as a data source in our Vega-Lite charts.

This will be the first time we've encountered a Python package in this notebook, so let's take a moment to understand what they are and why they're useful.

> **What are Python Packages?**
>
>
> A Python package is a collection of pre-written code that adds extra functionality to your Python programs. By using packages, you can avoid writing everything from scratch and leverage existing solutions to common problems.
>
> Think of a package as a toolkit that contains useful tools (functions and classes) that extend the basic capabilities of Python.
>
> **How Do We Use Python Packages?**
>
> In Python, you import a package using the `import` statement, which gives you access to its tools.
>
> For example, we'll use the `json` package to work with JSON data. By importing `json`, we can easily save dictionaries as JSON files and read them back into Python.
>
> **Built-in vs. External Packages**
>
> Some packages, like `json`, are built into Python. You don't need to install these; they come with Python by default. However, there are many useful packages that are not built-in. These need to be installed before you can use them.
>
> In this notebook, we will only use built-in packages, such as json. In future notebooks, you'll encounter other Python packages that are not built-in and will need to be installed first (typically using a tool called `pip`—Python's package manager, which we'll see later).
>
> **Analogy with JavaScript**:
> You've already seen how we can use JavaScript libraries in our HTML files by including a `<script>` tag in the `<head>` section to load an external library like Vega-Lite:
>
> ```html
> <script src="https://cdn.jsdelivr.net/npm/vega-lite@5"></script>
> ```
>
> This is similar to how Python packages work. Just like how the `<script>` tag gives your HTML access to Vega-Lite’s charting functionality, `import json` gives your Python code access to JSON handling functions.
>



...back to code

To save a dictionary as a JSON file, we use the `json` package and a special Python construct called with `open(...)`. Let's break down how this works.

In [18]:
import json

# Saving the dictionary as a JSON file
with open("person.json", "w") as file:
    json.dump(person, file)

1. `with open("person.json", "w") as file:`
  - The `open()` function is used to open a file. It takes two arguments:
    - The **filename** (`"person.json"`) specifies the name of the file we want to write to.
    - The **mode** (`"w"`) stands for "write mode." This means we are opening the file to write data to it. If the file doesn't exist, Python will create it. If the file already exists, its contents will be overwritten.
  - Using `with` ensures that the file is properly opened and closed automatically, even if an error occurs.

2. `json.dump(person, file)`
  - The `json.dump()` function writes the contents of the `person` dictionary to the file, converting the dictionary into a JSON-formatted string.
  - The first argument is the dictionary we want to save (`person`), and the second argument is the file object where the data should be written.

Now, check the files tab to the side of our code editor and we should our `person.json` file saved there. We could then download that to our local machine.

<br>

### 4.6 **Reading a JSON File into a Dictionary**

To read a JSON file into a Python dictionary, use the `json.load()` method. Let's load in the JSON data we just saved.

In [19]:
# Reading a JSON file into a dictionary
with open("person.json", "r") as file:    # We're now using "r" to specify 'read' mode.
    person_from_json = json.load(file)

print(person_from_json)

{'name': 'Alice', 'age': 31, 'city': 'Newport', 'job': 'Data Scientist'}


So, using this we can load any JSON data we've saved into Python, reading for analysis, cleaning etc.

<br>

### 4.7 **Nested Dictionaries**

Dictionaries can contain other dictionaries as values, creating a nested structure. This is particularly useful when dealing with complex data, such as hierarchical structures in JSON (which is commonly used by APIs and Vega-Lite specifications).

#### 4.7.1 Creating a Nested Dictionary

A nested dictionary is simply a dictionary where one or more of the values is another dictionary.

In [20]:
person = {
    'name': 'John',
    'age': 30,
    'address': {'street': '123 Main St', 'city': 'Anytown'}
}
print(person)

{'name': 'John', 'age': 30, 'address': {'street': '123 Main St', 'city': 'Anytown'}}


**Tip**: We can use the `json.dumps()` method with an `indent` parameter to print our dictionary with nice formatting.

In [23]:
print(json.dumps(person, indent=4))

{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "Anytown"
    }
}


Here's an example representing a simple Vega-Lite chart configuration:

In [1]:

# Example of a nested dictionary for a Vega-Lite chart
vega_lite_chart = {
    "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
    "data": {
        "values": [
          {"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43}
        ]
    },
    "mark": {
        "type": "bar"
    },
    "encoding": {
        "x": {"field": "a", "type": "nominal", "axis": {"labelAngle": 0}},
        "y": {"field": "b", "type": "quantitative"}
    }
}

print(vega_lite_chart)

{'$schema': 'https://vega.github.io/schema/vega-lite/v5.json', 'data': {'values': [{'a': 'A', 'b': 28}, {'a': 'B', 'b': 55}, {'a': 'C', 'b': 43}]}, 'mark': {'type': 'bar'}, 'encoding': {'x': {'field': 'a', 'type': 'nominal', 'axis': {'labelAngle': 0}}, 'y': {'field': 'b', 'type': 'quantitative'}}}


In this example, the data and encoding keys both contain nested dictionaries. Values in a dictionary can also be lists (i.e. arrays), which allows us to define data in-line with a list of dictionaries.

This structure mirrors the typical format of Vega-Lite specifications, where different parts of the chart (data, encoding, mark, etc.) are organised into separate nested dictionaries.

#### 4.7.2 Accessing Nested Dictionary Values

To access values in a nested dictionary, you chain the keys together. For example, to access the `field` value of the x-axis in the `encoding` dictionary:

In [27]:
# Accessing a nested value
x_field = vega_lite_chart["encoding"]["x"]["field"]
print(x_field)  # Outputs: a

a


This retrieves the "field" value under "x" in the "encoding" dictionary. You can keep chaining keys to access deeper levels of nesting.

#### 4.7.3 Modifying Nested Dictionary Values

You can also update values in a nested dictionary using a similar approach. For example, to change the chart type in the mark section:

In [2]:
# Adding a new key-value pair inside the "mark" dictionary
vega_lite_chart["mark"]["type"] = "line"
print(vega_lite_chart)

{'$schema': 'https://vega.github.io/schema/vega-lite/v5.json', 'data': {'values': [{'a': 'A', 'b': 28}, {'a': 'B', 'b': 55}, {'a': 'C', 'b': 43}]}, 'mark': {'type': 'line'}, 'encoding': {'x': {'field': 'a', 'type': 'nominal', 'axis': {'labelAngle': 0}}, 'y': {'field': 'b', 'type': 'quantitative'}}}


#### 4.7.4 Adding & Removing Nested Entries

Just like with regular dictionaries, you can add new key-value pairs inside nested dictionaries.

In [3]:
# Adding a new key-value pair inside the "encoding" dictionary
vega_lite_chart["encoding"]["color"] = {"field": "region", "type": "nominal"}
print(vega_lite_chart)

{'$schema': 'https://vega.github.io/schema/vega-lite/v5.json', 'data': {'values': [{'a': 'A', 'b': 28}, {'a': 'B', 'b': 55}, {'a': 'C', 'b': 43}]}, 'mark': {'type': 'line'}, 'encoding': {'x': {'field': 'a', 'type': 'nominal', 'axis': {'labelAngle': 0}}, 'y': {'field': 'b', 'type': 'quantitative'}, 'color': {'field': 'region', 'type': 'nominal'}}}


<br>

### <font color='Green'><strong>Dictionary Exercises: </strong></font>

Try completing these exercises on dictionaries.

### Exercise:

1. Create a dictionary that represents a simple Vega-Lite chart configuration (e.g., a bar chart with a title and a few fields like `x`, `y`, `data`, etc.). Make sure to include some data values.

2. Save this dictionary as a JSON file, then read it back in and print the content to verify.

3. Copy the code into the online vega editor - does it produce a chart?

<br>
<br>

---

## Bonus challenge

**Bonus EX1** Python notebooks are made up of **text** cells and **code** cells. These **text** cells use `markdown` syntax.

- Explain some basics on how markdown works, and how we can use markdown syntax or even HTML to style our cells.
- When you created your GitHub repositories, you should have also created an empty `README.md` file. (Tip: '.md' is the file extension for markdown files, just like '.json' is for JSON type files.)
- Go to your GitHub repositories and edit this README.md to add some description about your portfolio and project.

**Hint**: Check out this [markdown cheatsheet](https://www.markdownguide.org/cheat-sheet/) to find out how to style markdown text.