## Part 1 problem statement

(Adapted from [Advent of Code 2021, day 1](https://adventofcode.com/2021/day/1))

You are given a report of depth measurements, like

```txt
199
200
208
210
200
207
240
269
260
263
```

The first order of business is to figure out how quickly the depth increases.

To do this, count **the number of times a depth measurement increases** from the previous measurement. (There is no measurement before the first measurement.)

In the example above, the changes are as follows:

```txt
199 (N/A - no previous measurement)
200 (increased)
208 (increased)
210 (increased)
200 (decreased)
207 (increased)
240 (increased)
269 (increased)
260 (decreased)
263 (increased)
```

In this example, there are 7 measurements that are larger than the previous measurement.

How many measurements are larger than the previous measurement in the input file `input.txt`?

_Using my input file, the result should be 1292._

In [2]:
# IMPORTANT: Set this to correct path for you !
INPUT_FILE = "input.txt"

import pathlib
assert pathlib.Path(INPUT_FILE).exists()

### Baseline solution
The problem statement asks us to traverse the depth reports and to compare the current measurement with the previous one.

The underlying idea is that, when talking about a sequence (for example, a list of measurements), a relationship
of “previous” translates into subtracting 1 to the index at hands. Similarly, a relationship of “next” translates into adding 1 to the index.

The only thing we need to be careful about is ensuring we stay within the boundaries of the sequence, so that doing +1 or -1 in an index still gives a valid index.

This translates directly into this solution:

In [3]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

count = 0
for i in range(1, len(depths)):
    if int(depths[i-1]) < int(depths[i]): # Compare the previous one with current
        count += 1

print(count)

1292


If we stick to the “compare with the previous” interpretation, then the indices that matter are `i - 1` (the previous item) and `i` (the current item); and, therefore, `i` must start at `1`.
This means we use `range(1, len(depths))`.

If we go with the “compare with the next” interpretation, then the indices that matter are `i` (the current item) and `i + 1` (the next item).
For that, our range needs to end earlier than `len(depths)`:

In [4]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

count = 0
for i in range(len(depths) - 1):
    if int(depths[i]) < int(depths[i+1]): # Compare the current with "the next"
        count += 1

print(count)

1292


### Free resources ASAP

When using a `with` statement to access a file, you know that your file is automatically closed when you leave the `with` statement.
You also know that the `with` statement is nice because it will still close the file if, for example, your code throws an error.
That's very convenient, and a lovely reason to use the `with` statement.

However, while you are inside the `with` statement, the file remains open and in use by the operating system.
That is to say that you want to follow the Python practice of avoiding nesting whenever possible, because in this case it means that you will free up the file as soon as possible.
In other words, put as little code inside the `with` statement as possible.

In our case, because we use `.readlines` to read the whole file, we can leave the `with` statement immediately:

## The range of the length

Another frequent anti-pattern in Python is the excerpt `for i in range(len(...))`.
_Most of the times_, that `for` loop isn't what you really wanted to use.
Python has very powerful `for` loops, and the `for i in range(len(...))` is a pattern that we inherited from languages like C.
In Python, we tend to use built-ins like [`enumerate`](https://mathspp.com/blog/pydonts/enumerate-me) and [`zip`](https://mathspp.com/blog/pydonts/zip-up).

Another hint at the fact that the loop we care about is not the `range(len(...))`, is that we don't really care about the indices.
Notice how `range(len(something))` gives you all the legal indices associated with `something`, but what we really care about are the elements.

A slight improvement would be to recognise the `enumerate` pattern:
`enumerate` is a good built-in to use if, in a `for` loop, you care about the current element _and_ about the current index you are using.
In our case, we care about the current index so that we can compute the index of the neighbouring element.
So, we could try writing something like this:

In [5]:
with open(INPUT_FILE) as f:
    depths = f.readlines()

count = 0
for i, num in enumerate(depths[:-1]):
    if int(num) < int(depths[i+1]):
        count += 1

print(count)

1292


In order to make this work, we are using a slice to ignore the last element from `depths`.
After all, the slice `[:-n]` means [“drop the last `n` elements”](https://mathspp.com/blog/pydonts/idiomatic-sequence-slicing#s-n-3).

Another interesting thought would be to try and simplify the `i + 1` part.
The built-in `enumerate` accepts a `start` argument that specifies where the argument starts counting:

In [6]:
list(enumerate('code'))

[(0, 'c'), (1, 'o'), (2, 'd'), (3, 'e')]

In [9]:
for next_idx, num in enumerate('code', start=1):
    print(next_idx, num)

1 c
2 o
3 d
4 e


Therefore, one might think that we could set `start=1` to avoid having to perform the sum.
If we do so, then we must be very explicit about what index we are using:

In [12]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

count = 0
for next_idx, num in enumerate(depths[:-1], start=1):
    if int(num) < int(depths[next_idx]):
        count += 1

print(count)

1292


However, I personally don't like this.
There is something here that makes me look for a better solution, although some might say I'm just being paranoid.
But, the truth is, our `if` statement is very asymmetrical right now.

The solution lies elsewhere!
The built-in `zip` is more correct here, because `zip` is used to pair sequences up.
But what sequences do we want to pair up..?
After all, we have a single sequence at hands!

As it turns out, the both patterns of “this item & the next one” and “this item & the previous one” are easily written with `zip`.
We just have to remember that, if `seq` is a sequence, then `seq[1:]` means “drop the first element” and `seq[:-1]` means “drop the last element”:

![](zip_pairwise.png)

In [21]:
s = 'coder'
print(s[:-1])
print(s[1:])
print(list(zip(s[:-1], s[1:])))

code
oder
[('c', 'o'), ('o', 'd'), ('d', 'e'), ('e', 'r')]


In [23]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

count = 0
for prev_, next_ in zip(depths[:-1], depths[1:]):
    if int(prev_) < int(next_):
        count +=1

print(count)

1292


In the above, I wrote the name `next_` because `next` is a built-in function.
Then, I decided to use `prev_` instead of `prev` just for symmetry.
You can pick any other two names you prefer, or use `for prev, next_ in ...`.

To simplify things a bit, especially when doing a similar thing with three or more iterables, we can actually omit the slices that are cutting from the end, because `zip` stops as soon as one iterable stops.
In other words, we don't need to specify `depths[:-1]`:

![](zip_stops_shortest.png)

With that in mind, we can remove the extra slice:

In [24]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

count = 0
for prev_, next_ in zip(depths, depths[1:]):
    if int(prev_) < int(next_):
        count += 1

print(count)

1292


### Repeated `int` conversions

If you look closely at the `for` loop we are writing, you will notice that most of the values in `depth` are going to be passed in to the built-in `int` twice.
While that's not a terrible thing, it's _double_ the work we need.
All we need is for each number to be converted once, right?

Therefore, we can do the `int` conversion a bit earlier in the process:

In [25]:
with open(INPUT_FILE, "r") as f:
    depths = f.readlines()

depths = [int(d) for d in depths]
count = 0
for prev_, next_ in zip(depths, depths[1:]):
    if prev_ < next_:
        count+=1

print(count)

1292


Of course, now we have another issue of repeated work: first, we go over the whole file to read the lines with `.readlines`, and then we go over the file contents to convert everything into an integer.
We can do everything at once, if we convert the lines to integers _while_ we read them:

In [28]:
with open(INPUT_FILE, "r") as f:
    depths = [int(line) for line in f]

count = 0
for prev_, next_ in zip(depths, depths[1:]):
    if prev_ < next_:
        count += 1
    
print(count)

1292


In case you didn't know, [you can iterate over a file](https://mathspp.com/blog/til/006), which allows you to iterate over the lines.
That's what allowed us to convert into integers all the lines.

On top of that, you might be interested in knowing that [`int` is forgiving](https://twitter.com/mathsppblog/status/1466190674030698499), in that it allows the integers to be surrounded by whitespace:

### A really long input file

Like I mentioned earlier, we need to consider if our input file fits into memory or not.
Up to now, we have been reading the whole file at once, but we don't need to!
We just saw we can iterate over `f` lazily, so we can leverage that for our own solution.

When we do that, notice that throughout our comparison loop we will need to be reading new values from the file.
Thus, if the file is large and we can't read all of it at once, we need to keep it open.
In other words, we have to indent our code again.

On top of that, because we are assuming the file is _very_ big, we can no longer create the list of `depths`!
Thus, we have two options:

 - we can write a _generator expression_ for `depths`; or
 - we can use a `map` with `map(int, f)`.

Using a generator expression entails converting the `[ ... ]` in the list comprehension to `( ... )`:

In [29]:
with open(INPUT_FILE, "r") as f:
    depths = (int(line) for line in f)

    count = 0
    for prev_, next_ in zip(depths, depths[1:]):
        if prev_ < next_:
            count += 1

    print(count)

TypeError: 'generator' object is not subscriptable

However, as we do so, the `depths[1:]` stops working because generators are not indexable/sliceable.
This shows a weakness in our solution!

Thankfully, there are two good solutions!
Starting with Python 3.10, there is a function called `itertools.pairwise` that implements the exact `zip` pattern we wanted:

If you don't have Python 3.10 (or older), you can define your own `pairwise` using `itertools.tee`.

This isn't beginner-level Python, so feel free to skip this bit:

In [30]:
from itertools import tee

def pairwise(it):
    """Mock `itertools.pairwise` for Python versions below 3.10."""
    prev_, next_ = tee(it, 2)     # Split `it` into two iterables.
    next(next_)                   # Advance once.
    yield from zip(prev_, next_)  # Yield the pairs.

In [31]:
with open(INPUT_FILE, "r") as f:
    depths = (int(line) for line in f)
    
    count = 0
    for prev_, next_ in pairwise(depths):
        if prev_ < next_:
            count += 1
    print(count)

1292


One thing I want you to understand, and that may go unnoticed because of the structure of the analysis above, is that using `itertools.pairwise` is a good idea regardless of whether we have this “really big file” or not.
In fact, `pairwise` is the tool to go for when you want to implement the pattern `zip(seq, seq[1:])`.

### Counting by incrementing

In all of the solutions above we have been counting by incrementing the `count` variable only when the depth test passes.
However, there is a different approach to this, inspired by the languages where Boolean values are just 0s and 1s.
Instead of checking with an `if`, we can just add the value of the condition to `count`:

In [32]:
with open(INPUT_FILE, "r") as f:
    depths = (int(line) for line in f)

    count = 0
    for prev_, next_ in pairwise(depths):
        count += prev_ < next_
    
    print(count)

1292


When `prev_ < next_` is `True` (and the `if` statement would pass the test, incrementing `count` by one), the statement `count += True` increments `count` by one.
When `prev_ < next_` is `False` (and the `if` statement would not increment `count`), the statement `count += False` increments `count` by zero.

After all, Boolean values can be treated as integers:

In [33]:
1 + True

2

In [34]:
1 + False

1

This has to be used with caution, though, and is not _always_ advisable.

Here's a short aside:
a context in which this is good is in the context of data science, when you're using Pandas to analyse Boolean columns.
If you have a dataframe `df` and a Boolean column `df["bc"]`, then `df["bc"].sum()` uses this same principle to count how many rows are `True`.

### Summing the conditions

However, moving the condition to the value that is being incremented gives rise to another implementation.

After all, the pattern

```py
accumulator = 0
for element in iterable:
    accumulator += foo(element)
```

is equivalent to

```py
sum(foo(element) for element in iterable)
```

Therefore, we can rewrite our solution to be a sum:

In [36]:
with open(INPUT_FILE, "r") as f:
    depths = (int(line) for line in f)

    count = sum(prev_ < next_ for prev_, next_ in pairwise(depths))
    print(count)

1292


## Part 2 problem statement

(Adapted from [Advent of Code 2021, day 1](https://adventofcode.com/2021/day/1))

Instead of comparing consecutive values, consider sums of a three-measurement sliding window. Again considering the example below:

```txt
199  A      
200  A B    
208  A B C  
210    B C D
200      C D E
207        D E F
240          E F G
269            F G H
260              G H
263                H
```

Start by comparing the first and second three-measurement windows. The measurements in the first window are marked A (199, 200, 208); their sum is 199 + 200 + 208 = 607. The second window is marked B (200, 208, 210); its sum is 618. The sum of measurements in the second window is larger than the sum of the first, so this first comparison **increased**.

Your goal now is to count **the number of times the sum of measurements in this sliding window increases** from the previous sum. So, compare A with B, then compare B with C, then C with D, and so on. Stop when there aren't enough measurements left to create a new three-measurement sum.

In the above example, the sum of each three-measurement window is as follows:

```
A: 607 (N/A - no previous sum)
B: 618 (increased)
C: 618 (no change)
D: 617 (decreased)
E: 647 (increased)
F: 716 (increased)
G: 769 (increased)
H: 792 (increased)
```

In this example, there are 5 sums that are larger than the previous sum.

Consider sums of a three-measurement sliding window. How many sums are larger than the previous sum in the input file `input.txt`?

_You should arrive at the answer 1262._