# Essential Data Structures in Python
## Week 4 

This notebook has 4 sections, corresponding to the topics covered this week: 
1. List
    - 1.1 List comprehension 
    - 1.2 Working with lists 
2. Array
    - 2.1 Creating arrays 
    - 2.2 Reshaping arrays 
    - 2.3 Array Truthy and Falsy
3. Dictionary
    - 3.1 Dictionary comprehension 
4. Data frame
    - 4.2 the Index 
    - 4.2 Working with data frames
    - 4.3 Missing values in data frames 

In [None]:
# start by loading required packages and modules 
import pandas as pd 
import numpy as np

## 1. List 

We can create lists in Python using square brackets `[]`. Lists are heterogenous data structures, so they can include data elements of various data types, including lists themselves!

In [None]:
list_a = ["cat", "dog", 45.3, "horse", "fish", 68]

print(list_a)

In [None]:
print(list_a[0])

In [None]:
print(list_a[-2])

As a built-in data structure, there are a lot of useful functions when working with lists. To see these type the name of the list object followed by a dot and hit the [TAB] key.

In [None]:
## try it here - put your cursor after the . and hit [TAB]
list_a.

We can also request information about lists, such as the length using `len()`:

In [None]:
len(list_a)

### 1.1 **List comprehension** 
is an elegant way of creating lists, often with a single line of code. Compare the following two cells, both of which produce the same output - namely, a new list containing only fruits with the letter `"a"` in the name. 

List comprehension offers a streamlined syntax as follows: 
* `newlist = [expression for item in iterable if condition == True]`

In [None]:
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
# create an empty list 
newlist = []

# use a function to fill the empty list 
for x in fruits:
    if "a" in x:
        newlist.append(x)

print(newlist)

In [None]:
## list comprehension solution 

newlist_comp = [x for x in fruits if "a" in x]

print(newlist_comp)

### 1.2 Working with lists 

To add an item to the end of a list, for example, we can use `listobject.append()`. 

In [None]:
list_a.append("antelope")

print(list_a) 

## run this code cell a few times and see what happens

To remove a specific item from a list we can use `listobject.remove()`

In [None]:
list_a.remove(45.3)

print(list_a)

We can join two lists together in Python by simply using the `+` sign 

In [None]:
list_a + fruits

You can sequentially iterate over a list using a for loop. In Python, we use a "for in" loop construction which is similiar to "for each" loops you find in C++ and Java. For Loop Syntax is as follows: 

` for iterator in sequence: 
     statements(s)`

Note that white space indendtation, as mentioned in Week 1, is important here! If you go to a new line after the colon `:` (by pressing ENTER, Jupyter will automatically do this indentation for you. 

In [None]:
# using a for loop to iterate over a list and print each element 
## using range(len(list)) to obtain the index of each element 

for i in range(len(list_a)):
    print( list_a[i] )

As mentioned above lists can be comprised of more lists, this results in a list of lists being comprised of 2 dimensions. We can index such a list with `list[dim1][dim2]`. Let's see how this works.

In [None]:
# let's create a two-dimensional list:
list_of_lists = [[3,4,5,6,7], [30,40,50,60,70]] 

# print the big list
print(list_of_lists) 

# print first sub-list
print(list_of_lists[0]) 

# print first number in first sub-list
print(list_of_lists[0][0])

In [None]:
# remember indexes count from 0. So first element has index 0, second index 1, third index 2, etc
# to go into the second list, then to its fourth element (which should have value 60), you would use
print(list_of_lists[1][3])

# negative numbers count from the end
# to get from the first list its last element (which should be 7), you would use
print(list_of_lists[0][-1])

In most instances, your data will have many dimensions. As a recap: 
    
- a variable has **zero dimensions** - you do not need any more index/address to know what's in it
- a list/array has **one dimension** - index is the way to address individual items in a list
- a list of lists has **two dimensions** - index of the top list, and then an index of the inner list  

## 2. Array 

Array is is basically like a List, but has a number of new very powerful methods and syntax that make data operations easier and faster. Theoretically you could do everything that we do with Arrays by just using good old Lists, but it would take more time and be less compatible with other libraries.

To create an Array, you can cast a list into `np.array(your_list)`. Notice the `np.array` at the begining - it means that you are using the `array class` from the ```np``` library. `np` is a short name for `numpy` (which we gave it in `import numpy as np` at the top of this notebook)

To work with array data structures, we need to `NumPy` package. In fact, `NumPy` was designed specifically to perform numerical operations with n-dimensional arrays. Arrays store values of the same data type. The NumPy vectorization of arrays significantly enhances performance and accelerates the speed of computing operations.

In [None]:
my_list = [3, 7, 5, 5]
print(my_list)

In [None]:
# you can create an array by feeding in a List directly or as a data object:

my_array = np.array(my_list)
print(my_array)

my_array2 = np.array([3, 0, 5, 0])
print(my_array2)

# notice it is printed a bit differently than a list! 
# the commas are not present compared to the list print out in the cell above

Arrays are often used in situations where there are many dimensions. So a grid of 

`
1,  2,  3,  4
11, 12, 13, 14
21, 22, 23, 24
`

Can be represented as:

In [None]:
scores = np.array([ [1, 2, 3, 4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
# do you see it? A list with 3 lists, each with 4 items inside!

print(scores)

# let's print what type of a thing it is:
print(type(scores))

`Numpy` arrays bring with them a new addressing style. We can use `array[first_dimension, second_dimension, third_dimension, ....]`, meaning you can pass an index of each next dimension separated by commas. You can use the same style (pure Python way) as with lists we looked at above ```my_list[first_dimention][second_dimention]```, but using a single square brackets with commas for dimensions is more common. Let's see how this works with some multidimensional arrays

In [None]:
## above we created a multidimensional array called scores 

print(scores[0, 0])
print(scores[0, 1])
print(scores[1, 2])
print(scores[1, -1])
print(scores[-1, -1])

# look at the printed output and see if you understand why these numbers are printed

We can use the index to not only get a value, but also change it. 

In [None]:
# let's change some items

# reassign values 
scores[0,0] = 10
scores[0,1] = 20

# change values mathamatically 
scores[1,3] += 30
scores[-1,-1] += 40

print(scores)

You can also ommit one of the dimensions, and replace everything in a row or column. Use colon `:` to indicate 'everything'

In [None]:
scores = np.array([ [1,  2,   3,  4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
 
scores[0,:] = 10 
print(scores)
# first dimention: value 0, second dimention: all values
# note you could also use simpler version scores[0] = 10

In [None]:
scores = np.array([ [1,  2,   3,  4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])
 
scores[:,2] = 30  
print(scores)
# first dimention: all values, second dimention: value 2
# note but here you cannot simplify it to scores[,2] = 30

To slice arrays there is a new syntax, which can be used in lists and data frames as well! ` my_array[start_index : stop_index : step/jump]`

In [None]:
digits = np.arange(10,20)
print(digits)
print(digits[2:7:2]) # from index 2, till index 7, jumping every 2

In [None]:
print(digits[:7:2]) # from beginning, till index 7, jumping every 2

In [None]:
print(digits[2::2]) # from index 2, till end, jumping every 2

In [None]:
print(digits[::2]) # all, jumping every 2

In [None]:
# with negative step/jump the array gets reversed
print(digits[::-1]) # all, but index counting down
print(digits[::-3]) # all, but index counting down every 3

There is a variety of information you can request about arrays with methods including: 
* dimension with `.ndim`
* shape with `.shape`
* size with `.size`
* data type with `.dtype`

In [None]:
# you can request some info about a multi-dimensional Arrays:
scores = np.array([ [1, 2, 3, 4],
                    [11, 12, 13, 14],
                    [21, 22, 23, 24]
                   ])

print("dimensions:", scores.ndim)
print("shape:", scores.shape)
print("size:", scores.size)
print("data type:", scores.dtype)

### Creating arrays 

You can specify the default value and type of your new empty array:

- full of zeros with `np.zeros()`
- full of ones with `np.ones()`
- full of some other value with `np.full(some_value)`
- full of random values

In [None]:
np.zeros(4)

As you can see, by default the values created are floats. But you can specify the data type with the `dtype` argument. It is good practice when creating arrays with specified values to be explicitly cast with the desired data type. 

In [None]:
np.zeros(5, dtype = 'int')

In [None]:
np.zeros(5, dtype = 'bool')

You can also create multi-dimentional arrays with sizes of all dimentions in a tuple (which we will learn more about next week - essentially it is a immuntable list made with `()`):

- `(10)` - array of 10 elements
- `(5,10)` - 5 sets of 10 elements
- `(3,5,10)` - 3 sets of 5 sets of 10 elements

In [None]:
np.zeros((10), dtype = float)

In [None]:
np.zeros((5,10), dtype = int)

In [None]:
np.zeros((3, 5,10), dtype = int)

You can also specify values other than zero with `np.ones( dimensions )` or with `np.full(dimensions, value)`

In [None]:
np.ones((2,5), dtype = int)

In [None]:
np.full((2,5), 3.14)

In [None]:
# so as you can these two do the same thing:

print(np.ones((2,5), dtype = int))
print(np.full((2,5), 1))

# in programming there are often multiple ways of doing the same thing! 

You can also create values

- from a range with `np.arange(start, top, jump)`
- with even split between two values `np.linspace(start, end, slices)`
- indentity matrix with `eye(size)`
- to repeat a pattern with `np.tile()`
- full of random data with `np.random.randint(max_value, size = (size_tuple))`

In [None]:
np.arange(10)

In [None]:
np.arange(5,15)

In [None]:
np.arange(5, 50, 10) # range has start, end, jump

In [None]:
np.linspace(0, 1, 5) # from_value, to_value, how_many_slices

In [None]:
np.linspace(0, 100, 3, dtype = int) # from_value, to_value, how_many_slices

In [None]:
np.eye(4)

In [None]:
pattern = np.array([0, 1, 2])

np.tile(pattern, 2) # the number (or tuple) given in tile describes size/times of output matrix # repeat 2 times

In [None]:
np.tile(pattern, (3,2))
# repeat 2 times in one dimension, and 3 times in another dimension 

In [None]:
# random numbers
print(np.random.randint(10, size = 6))

# every time you run this cell, your random numbers will be differnt. 
## Try it!

In [None]:
# but if you specify a seed, your random numbers will be the same every time you run this cell! 
# this can be very helpful when debugging or in some statitical analyses using random numbers or sampling 

np.random.seed(0) # plant the seed 0 - this could be any number
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))
print(np.random.randint(10, size=6))

## run this cell a few times to see that the numbers do not change

Let's now create some arrays full of random values. 

In [None]:
# ONE DIMENTION - with size 6
array1 = np.random.randint(100, size = 6)
print(array1)
print(array1[0])

In [None]:
# TWO DIMENTIONS - with size 4 for first and 5 for the second
# these could be scores for four courses, each with 5 students

array2 = np.random.randint(100, size = (4, 5))
print(array2)
print() # empty print statement to organise output a bit easier 

print(array2[3])
print()
print(array2[3, 4])

In [None]:
# THREE DIMENTIONS - with size 3 for first and 4 for the second and 5 for third
array3 = np.random.randint(100, size = (3, 4, 5))

print(array3)

# change the numbers in size to ensure you are understanding what they are doing 

### 2.2 Reshaping arrays 

We can change the dimensions of an array with `.reshape()` We can concatinate or flatten arrays(and lists!) using `np.concatenate()` which will remove 1 dimension. We can also split one array into many arrays unsing predefined indexes with the `np.split()` function. 

In [None]:
# lets start out with a one dimensional array 
print(np.arange(12)) 

In [None]:
print(np.arange(12).reshape((2, 6)))

In [None]:
print(np.arange(12).reshape((4, 3)))

In [None]:
print(np.arange(12).reshape((3, 2, 2)))

In [None]:
# This causes an erorr - you can't split 12 numbers into 4 sets of 5 numbers!
print(np.arange(12).reshape((4, 5)))

In [None]:
arraya = np.array([10,20,30])
arrayb = np.array([40,50,60])
parent_array = [arraya, arrayb]

print(parent_array)

In [None]:
print( np.concatenate(parent_array) )

In [None]:
## works the same for 2-D lists 
lista = [70, 80,90]
listb = [1, 2, 3]

print( np.concatenate([lista, listb]) )

In [None]:
# you can even concatenate lists and arrays together. They really are very simmilar
arraya = np.array([10,20,30])
arrayb = np.array([40,50,60])
lista = [70, 80,90]

print( np.concatenate([arraya, arrayb, lista]) )

In [None]:
# concatenation respects dimensions, it will flatten only the top dimension
two_dimension_array1 = np.array([ [1,2,3],    [4,5,6] ])
two_dimension_array2 = np.array([ [10,20,30], [40,50,60] ])

print(two_dimension_array1)
print()
print(two_dimension_array2)

In [None]:
print( np.concatenate([two_dimension_array1,two_dimension_array2]) )

Concatenate has an extra argument called `axis`: ```np.concatenate([arr1, arr2], axis = 0)``` which by default is 0

- `axis = 0` (the default) - flatten horizontally - remove one dimension from all items in list
- `axis = 1` - flatter vertically - combine all first items, then all second items, all third... etc

In [None]:
two_dimension_array1 = np.array([ [1,2,3], [4,5,6] ])
two_dimension_array2 = np.array([ [10,20,30], [40,50,60] ])

print( np.concatenate([two_dimension_array1,two_dimension_array2], axis = 0) )

In [None]:
print( np.concatenate([two_dimension_array1, two_dimension_array2], axis = 1) )

In [None]:
digits = np.arange(1, 10)
print(digits)

three_sub_arrays = np.split(digits, [3, 6])
print(three_sub_arrays)

In [None]:
# We have not seen this syntax yet, it's typical used in more advanced uses of Python. 
# You can specify many variables in one line, but assigning a List to them - see how useful data structures can be! 

a, b, c = [10,20,30]
print(a, b, c)

In [None]:
# so split can be used as follows: 

start, middle, end = np.split(digits, [3, 6])
print(start, middle, end)

In [None]:
# Note: you could achieve the same effect with many lines of code with range()
# but that requires much more thinking and opportunities for mistakes 

first = np.arange(1, 4)
second = np.arange(4, 7)
third = np.arange(7, 10)
print(first, second, third)

# but why do something the hard way if there is a proper syntax for it?

### 2.3 Array Truthy and Falsy
Like numbers which we learned about in week 2 and 3, arrays evaluate as truthy and falsy depending on how they compare to 0. 

In [None]:
a1 = np.array([0])

print(a1)

print(len(a1))

print(bool(a1))

Even though `a1` has a length of 1, it is still falsy because its value is 0. When arrays have more than one element, some elements might be falsy and some might be truthy. In those cases, NumPy will raise a `ValueError`

In [None]:
a2 = np.array([0, 1])

bool(a2) #produces a value error

In [None]:
## are any values truthy
print(a2.any())

## are all values truthy
print(a2.all())

## 3. Dictionary 

As a reminder, we create dict data structures in Python using curly brackets `{}` and specifying the `key:value` pair.

In [None]:
dict = {"city" : "Edinburgh",
       "univeristy" : "UoE",
       "year" : 1583}

print(dict)

As a built-in data structure, list lists, there are a lot of useful functions when working with dictionaries. To see these type the name of the dictionary object followed by a dot and hit the [TAB] key.

In [None]:
## try this here - put your cursor after the . and hit [TAB]
dict.

In [None]:
# to see the list of keys 
print(dict.keys())

#to see the list of values 
print(dict.values())

You can also access each key-value pair within a dictionary using the `.items()` method

In [None]:
print(dict.items())

Since we cannot have duplicate keys, you can change the `key:value` mapping by setting the key to a new value.

In [None]:
dict["city"] = "Glasgow"

print(dict)

Using the same syntax though with a unique key, you can add a new item to a dict data structure. 

In [None]:
dict["campus"] = "George square"

print(dict)

### 3.1 Dictionary comprehension 

Dictionary comprehension has been available since Python 2.7 and like list comprehension, is an efficient way of creating new dictionaries. Dictionary comprehensions takes the form: `{key: value for (key, value) in iterable}`

In [None]:
dict_comp = {x: x**2 for x in [1,2,3,4,5]} 

print(dict_comp)

Let's say that now we want to double each value in our dictionary that we created called `dict_comp`

In [None]:
dict_double = {k:v*2 for (k,v) in dict_comp.items()}

print(dict_double)

You can also use dictionary comprehension to make changes to the key values. Let's make the same dictionary as `dict_double` but also change the names of the key 

In [None]:
dict_keys = {k*10:v for (k,v) in dict_double.items()}

print(dict_keys)

A useful function of Python dictionaries is to convert them to data frames. When a dictionary is converted to dataframe, the key become the column name and the value becomes the elements of the series (in other words the rows of a column). The function we use for this is `pd.DataFrame.from_dict()`. Use `orient='index'` to create the DataFrame using dictionary keys as rows

In [None]:
pd.DataFrame.from_dict(dict, orient = "index")

## 4. Data frames

A data frame is a 2 dimensional structure (rows and columns) which can contain hetereogenous data typed elements. Data frames are not a built-in Python data struture, thus we need to `pandas` package in order to access and work with this data structure.

We will be using the gapminder data set, which is available in R in the `gapminder` package. We saw this dataset briefly in the Week 1 tutorial. Lets read in the data.

In [None]:
gap_data = pd.read_csv("../data/gapminder_data_unfiltered.csv")

In [None]:
## check data descriptions 

print(gap_data.info())

In [None]:
gap_data.describe()

In [None]:
# include all columns not just numeric data - by default the describe method includes only numeric data  
print(gap_data.describe(include = "all"))

In [None]:
## check df dtype of data frame columns
gap_data.dtypes

To slice or subset rows, we can use the same notation as with arrays `dataframe[start: stop: step]`

In [None]:
print(gap_data[1:3])

To access columns, we can use `[]` with the column name or a dot.

In [None]:
# get a column with [] 
# Notice meta information about names and data types at the bottom!
print(gap_data['year'])

In [None]:
# but quite frequently you would use a . dot notation, like this:
print(gap_data.year)

In [None]:
# to get individual items
print(gap_data.year[0])

# same as - just a different style of syntax 
print(gap_data['year'][0])

In [None]:
# or a few individual items
print(gap_data.year[0:5])

### 4.1 the Index 

The index is the most important part of your data. It should be unique, but does not have to be. 

If you do not specify the index in your data, Python will just use continuous numbers starting from 0 (like 0,1,2,3,4,...). It's sort of like a row name in Excel.

```.set_index(a_column_name)``` will set a column with name `a_column_name` to be the index

```drop=False``` will make the old column stay (it will sort-of get duplicated and you'd have two identical columns: the original one, and the new index column)

You could also have many columns act as  indexes, but we will not go into that. If you wanted to do that, just pass a List of column names to `set_index` rather than one column name.

In [None]:
# the index is the numbers on the left (0-3308)
# notice it does not have a column/variable name 
gap_data

In [None]:
# we can turn 1 of the columns into the index 
# notice the spacing difference between the column names and the index name 
gap_data.set_index("year")

In [None]:
# but notice above year is now gone. to keep that column as it was, add drop = False
gap_data_yindex = gap_data.set_index("year", drop = False)
gap_data_yindex

In [None]:
# so now you can use your index to get whole rows from the dataframe
# this can be bit cleaner than indexes 1,2,3,4... depending on your data
gap_data_yindex.loc[[1987]]

## see section below for what the .loc method does 

## 4.2 Working with data frames 

When working with data frames, there are a wide variety of functionalities available to you through `pandas`. Here we will focus on some key functions and attributes: 

* `groupby()` to perform subgroup analysis on your data
* `reset_index()` to reset the index of your data frame. It is good practice to do this after `groupby()` in particular to avoid funky and unexpected results
* `assign()` to assign or calculate a new column. `assign()` *does not* change the original dataframe, which is a special change of pace! That's because if you specified `inplace=True` it would just add a column called 'inplace' and put values True in every row of that column. 
* `agg()` aggregate using one or more operations over a specified range of columns or rows 
* `isin()` along side slicing to filter specific rows of a data frame
* `query()` to query or filter columns of a data frame with a boolean expression 
* `loc` to select rows and columns by labels or a boolean array, stands for location
* `iloc` to select rows and columns by integer-location indexing for selecting by position, stands for integer-location
* `df.astype()` to change the data type of a `pandas` object

You can also assign new values to a selection of your data frame using `loc`/`iloc`.

When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Additioanlly, in `pandas` you *cannot* use `or`/`and` but need to use the 'or' operator `|` and the 'and' operator `&`.

`pandas` data frame provide a full set of all statistical methods as well. If you need something specific, always [look in the documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)

In [None]:
# mean of all populations in the dataset grouped by continent 

print(gap_data["pop"].groupby(gap_data["continent"]).mean())

In [None]:
# subset data to be only year and continent variables

gap_filter = gap_data.iloc[:, 0:2].copy()

## [:, 0:2] reads as everything or all rows, slicing columns to be only the first 2 following [row, columns]
## .copy() method to get a regular copy, to avoid the case where changing gap_filter also changes gap_data

gap_filter

In [None]:
# filter data to be only 1980 and before

gap_data.loc[gap_data["year"] >= 1980]

In [None]:
# filter data to be specific countries

gap_data.loc[gap_data["country"].isin(["Thailand", "Peru"])]

In [None]:
# get the mean of a column 

print(gap_data["gdpPercap"].mean())

## or you could use 
print(gap_data.gdpPercap.mean())

In [None]:
# create a new variable called mean_gdpPercap with the mean of the whole data set using assign 

gap_data.assign(mean_gdpPercap = gap_data["gdpPercap"].mean())

In [None]:
# let's try again and see how inplace does something unexpected, as mentioned above 

gap_data.assign(mean_gdpPercap = gap_data["gdpPercap"].mean(), inplace = True)

In [None]:
# to get the mean population across all years per country we just first need to do a groupby to group the data by country 

gap_data["pop"].groupby(gap_data["country"]).mean()

We can use our knowledge of the dictionary data stucture to get a summary table of our data with aggregate statistics per column of interest: 

In [None]:
# to get a summary table with aggregations per column 

gap_data.agg({'year' : ['min', 'max'], 'pop' : ['sum', 'min', 'max']})

We cast specific columns in a data frame to another dtype using `as.type()` and a dictionary. Lets make `year` a category:

In [None]:
gap_data.astype({"year": "category"}).dtypes

# adding .dtypes at the end has Python print out the dtype of the data frame after changing the year dtype 
## in this way we can check our work 

### 4.3 Missing values in data frames

`pandas` treat `None` and `NaN` as essentially interchangeable for indicating missing or null values. To facilitate this convention, there are several useful functions for detecting, removing, and replacing null values in a `pandas` DataFrame:

* `isnull()` returns a data frame of Boolean vlaues which are `True` for NaN values 
* `notnull()` returns a data frame of Boolean vlaues which are `False` for NaN values 
* `df.dropna()` allows you to analyze and drop Rows/Columns with missing values in different ways. By default uses row-wise deletion so all rows with missing data are deleted
* `df.fillna()` allows you to replace missing values with some other specified value 
* `df.replace()` replaces a string, regex, list, dictionary, series, number, etc. from a `pandas` data frame
* `df.interpolate()` which uses various interpolation technique to fill the missing values rather than hard-coding the value. This is particularly useful with numeric data although there are numerous methods available - [see the documentation if you are interested in learning more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html)

We looked at dealing with missing data in Week 2 & 3, but lets create a data frame with some missing data to remind ourselves how this works. 

In [None]:
# to create a DataFrame we put a Dict it its constructor. But remember that values need to be lists
patients = pd.DataFrame(
    {"names": ["Angela", "Shondra", np.nan, "Ben"],
     "age": [27, np.nan, 57, 44],
     "result": [True, False, np.nan, False]})

patients.info()

In [None]:
patients.isnull()

In [None]:
# to check for missing values in a specific column
patients.names.isnull()

In [None]:
patients.dropna()

In [None]:
patients.fillna("missing")

In [None]:
# use the method argument in fillna the fill missing data with previous item 
patients.fillna(method = "ffill") ## ffill being forward fill 

In [None]:
# or to fill the missing data with the next viable observation 
patients.fillna(method = "bfill") # bfill meaning back fill 

In [None]:
# the default method is linear which applies only to numeric data
patients.interpolate(method = "linear")

In [None]:
patients.interpolate(method = "pad") # this method is the same as ffill 

---

## You did it! üéâ 

Well done for making it to the end of this notebook. If you have not done so yet, move to the Week 4 data structures in R RMarkdown notebook next. 

‚≠ê‚≠ê‚≠ê‚ùìüë£ Do not forget your 3 stars, a wish, and a step mini-diaries once you have completed the content for this week. 

---
*Dr. Brittany Blankinship (2024)*