## A Python Learning Journey

Throughout this notebook, I will be posting all my codes for the different Advent of Code challenges. Since I am doing this for learning purposes, I will post my original code, successful or not, efficient or not, and will try to search for better solutions of other programmers to increase my awareness of libraries, methods, functions, best practices and so on. In the end, I expect this notebook to be helpful for my future self and, maybe, for others novice programmers doing these challenges too.

### Day 1

My first code was really awful in terms of efficiency. I tried to run three for loops that ran forever, I even had to stop the process. Luckily, the results had been found already. This was the code:

In [None]:
#import pandas as pd
#
#data = pd.read_csv('input.txt', header=None)
#data.columns = ['Value']
#data.iloc[0].values + 10
#
#for i, val in enumerate(data.Value):
#    for i2 in range(len(data)):
#        for i3 in range(len(data)):
#            if data.iloc[i].values + data.iloc[i2].values + data.iloc[i3].values== 2020:
#                entry1 = i
#                entry2 = i2
#                entry3 = i3
#                break
#            else:
#                continue
#
#check = data.iloc[entry1].values + data.iloc[entry2].values + data.iloc[entry3].values
#ans = data.iloc[entry1].values * data.iloc[entry2].values * data.iloc[entry3].values

The following is a code from a colleague that I like given the different perspective of the solution. You could find the code here too: https://github.com/ed-hermoreyes/advent_of_code2020/blob/main/Advent-of-code-2020.ipynb

In [1]:
### PART ONE ####
with open('day1.txt','r') as doc:
    numbers = [int(i) for i in doc.read().splitlines()]
    
for num in numbers:
    if (2020-num) in numbers:
        print(num*numbers[numbers.index(2020-num)])
        
### PART TWO ###
from itertools import combinations

num_combi = combinations(numbers, 3)

def product(sequence):
    initial = 1
    for i in sequence:
        initial *= i
    return initial

for comb in num_combi:
    if sum(comb) == 2020:
        print(product(comb))

776064
776064
6964490


I really like the simplicity of the first part of the code, to what I think that more thoughts most be put in a coding problem in order to find other perspectives if one is not efficient enough. Also, like always, searching on the web or asking others for correction, ideas, etc. would be of help. In the second part, I got to know the itertools library, definitely one that I will keep in mind.

Searching on the web, I found this article on the used of for loops and some better alternative to know when the former is not that efficient. https://medium.com/python-pandemonium/never-write-for-loops-again-91a5a4c84baf

### Day 2

In [2]:
import pandas as pd

data = pd.read_csv('day2.txt', header=None, delimiter=' ')
data.columns = ['len','char','pass']
data['len'] = [x.split('-') for x in data['len']]
data['char'] = [x.strip(':') for x in data['char']]

### PART ONE ###

data['valid'] = 0
for i,row in data.iterrows():
    letter = row['char']
    count = row['pass'].count(letter)
    r = range(int(row.len[0]),int(row.len[1])+1)
    if count in r:
        data.loc[i, 'valid'] = 1
        
print('Result Part 1: ', data.valid.sum())

### PART TWO ###

data['valid2'] = 0
for i,row in data.iterrows():
    letter = row['char']
    p1 = int(row.len[0]) - 1
    p2 = int(row.len[1]) - 1
    if row['pass'][p1] == letter and row['pass'][p2] != letter:
        data.loc[i, 'valid2'] = 1
    elif row['pass'][p1] != letter and row['pass'][p2] == letter:
        data.loc[i, 'valid2'] = 1

print('Result Part 2: ', data.valid2.sum())

Result Part 1:  636
Result Part 2:  588


### Day 3

In [2]:
### PART ONE ###
text = open('day3.txt').readlines()
for i in range(len(text)):
    text[i] = text[i]*33
    text[i] = text[i].replace('\n', '')

position = 3
count = 0 
for i,line in enumerate(text):
    if i == 0:
        continue
    loc = line[position]
    if loc == '#':
        count += 1
    position += 3  

print(count)

244


For the first part I did not have much problems, but I in the second task the code would encounter a problem of scalability in the reproduction of the text in line 4. If a requested slope have a bigger pattern to the right the increase necessity in storage would reduced the efficiency. Also, if different slopes are asked, the text would need to be reproduced (like in line 4) for each slope.

Because of that, I looked on the reddit for some inspiration and saw that many people used a rule with remainders of a division to solve this. 

In [4]:
### PART TWO ###
import math
text = open('day3.txt').readlines()
for i in range(len(text)):
    text[i] = text[i].replace('\n', '')

def tree_count(text, right, down):
    '''
        Count the encounter trees in a text following an additive pattern of
        characters to the RIGHT and, after that movement, a pattern of lines 
        DOWN.
    '''    
    row = down
    char = right
    count = 0
    for i,line in enumerate(text):
        if i != row:
            continue
        if char >= len(line):
            location = line[char % len(line)]
        else:    
            location = line[char]
        if location == '#':
            count += 1
        row += down
        char += right
    return count

slopes = [(1,1),(3,1),(5,1),(7,1),(1,2)]
trees_text = [tree_count(text, right=slope[0], down=slope[1]) for slope in slopes]
print(math.prod(trees_text))

9406609920


In this part I had a problem with the result for a while. The count of trees was ok, but I was using the <code>numpy.prod()</code> for the last operation. It caused what is call a **integer overflow**, which according to [Wikipedia](https://en.wikipedia.org/wiki/Integer_overflow#:~:text=In%20computer%20programming%2C%20an%20integer,than%20the%20minimum%20representable%20value.) is:
> _In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is outside of the range that can be represented with a given number of digits â€“ either higher than the maximum or lower than the minimum representable value._

So, because of this the function was giving to me a wrong number. The solution was to pass the argument <code>dtype</code> as <code>'int64'</code>.

### Day 4

In [1]:
### PART ONE ###
import pandas as pd
data = open('day4.txt').read()
pps = pd.DataFrame(data.split('\n\n'), columns=['info'],)
pps['valid'] = 0

for i, passport in enumerate(pps['info']):
    if 'byr' not in passport:
        continue
    elif 'iyr' not in passport:
        continue
    elif 'eyr' not in passport:
        continue
    elif 'hgt' not in passport:
        continue
    elif 'hcl' not in passport:
        continue
    elif 'ecl' not in passport:
        continue
    elif 'pid' not in passport:
        continue
    else:
        pps['valid'][i] = 1

sum(pps.valid)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pps['valid'][i] = 1


228

In [3]:
### PART TWO ###
import re
import os

os.chdir(r"C:\Users\datds\Documents\GitHub\AoC2020")

data = open('day4.txt').read().split('\n\n')
valid = 0

for i,passport in enumerate(data):
    # BYR test
    byr = re.search(r'byr:([\d]{4})', passport)
    if byr == None:
        continue
    if not (1920 <= int(byr.groups()[0]) <= 2002):
        continue
    # IYR test
    iyr = re.search(r'iyr:([\d]{4})', passport)
    if iyr == None:
        continue
    if not (2010 <= int(iyr.groups()[0]) <= 2020):
        continue
    # EYR test
    eyr = re.search(r'eyr:([\d]{4})', passport)
    if eyr == None:
        continue
    if not (2020 <= int(eyr.groups()[0]) <= 2030):
        continue
    # HGT test
    hgt = re.search(r'hgt:([\d]+)(cm|in)', passport)
    if hgt == None:
        continue
    if hgt.groups()[1] == 'cm':
        if not (150 <= int(hgt.groups()[0]) <= 193):
            continue
    if hgt.groups()[1] == 'in':
        if not (59 <= int(hgt.groups()[0]) <= 76):
            continue
    # HCL test
    hcl = re.search(r'hcl:(\#[\da-f]{6})', passport)
    if hcl == None:
        continue
    #ECL test
    ecl = re.search(r'ecl:(amb|blu|brn|gry|grn|hzl|oth)', passport)
    if ecl == None:
        continue 
    #PID test
    pid = re.search(r'pid:([\d]{9})', passport)
    if pid == None:
        continue
    valid += 1
print(valid)

176


The fourth challenge was useful to study again the regex syntax. It was really useful the [regex101](https://regex101.com/) builder for this task, which allowed me to wrote and checked my syntax quicker.

### Day 5

In [1]:
### PART ONE ###
import pandas as pd

seats = pd.read_csv('day5.txt', names=['code'])
seats['seat'] = None
seats['seat_id'] = 0

for i, code in enumerate(seats.code):
    row = list(range(1,129))
    for j in range(0,7):
        if code[j] == 'F':
            row = row[:int(len(row)/2)]
        else:
            row = row[int(len(row)/2):]
    col = list(range(1,9))
    for k in range(7,10):
        if code[k] == 'L':
            col = col[:int(len(col)/2)]
        else:
            col = col[int(len(col)/2):]
    
    seats.at[i , 'seat'] = (row[0] - 1, col[0] - 1)
    seats.at[i , 'seat_id'] = (row[0] - 1) * 8 + col[0] - 1 

max(seats.seat_id)

935

In [2]:
### PARTE TWO ###
s = seats.sort_values('seat_id').reset_index()

i = 1
while i < len(s):
    prev = s.seat_id[i - 1]
    act = s.seat_id[i]
    if act - prev == 2:
       print(act - 1)
       break
    i += 1

743


This problem wasn't difficult to resolved with my previous knowledge of python. But I learned a two new functions: <code>.at</code> and <code>.iat</code>, which localize a cell like <code>.loc</code> and <code>.iloc</code> but they are used to get or assign a single value in the cell ([here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html) the documentation).