--- Day 1: Historian Hysteria ---

The Chief Historian is always present for the big Christmas sleigh launch, but nobody has seen him in months! Last anyone heard, he was visiting locations that are historically significant to the North Pole; a group of Senior Historians has asked you to accompany them as they check the places they think he was most likely to visit.

As each location is checked, they will mark it on their list with a star. They figure the Chief Historian must be in one of the first fifty places they'll look, so in order to save Christmas, you need to help them get fifty stars on their list before Santa takes off on December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

You haven't even left yet and the group of Elvish Senior Historians has already hit a problem: their list of locations to check is currently empty. Eventually, someone decides that the best place to check first would be the Chief Historian's office.

Upon pouring into the office, everyone confirms that the Chief Historian is indeed nowhere to be found. Instead, the Elves discover an assortment of notes and lists of historically significant locations! This seems to be the planning the Chief Historian was doing before he left. Perhaps these notes can be used to determine which locations to search?

Throughout the Chief's office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don't miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.

There's just one problem: by holding the two lists up side by side (your puzzle input), it quickly becomes clear that the lists aren't very similar. Maybe you can help The Historians reconcile their lists?

For example:
```
3   4
4   3
2   5
1   3
3   9
3   3
```

Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.

Within each pair, figure out how far apart the two numbers are; you'll need to add up all of those distances. For example, if you pair up a 3 from the left list with a 7 from the right list, the distance apart is 4; if you pair up a 9 with a 3, the distance apart is 6.

In the example list above, the pairs and distances would be as follows:

- The smallest number in the left list is 1, and the smallest number in the right list is 3. The distance between them is 2.
- The second-smallest number in the left list is 2, and the second-smallest number in the right list is another 3. The distance between them is 1.
- The third-smallest number in both lists is 3, so the distance between them is 0.
- The next numbers to pair up are 3 and 4, a distance of 1.
- The fifth-smallest numbers in each list are 3 and 5, a distance of 2.
- Finally, the largest number in the left list is 4, while the largest number in the right list is 9; these are a distance 5 apart.
- To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is 2 + 1 + 0 + 1 + 2 + 5, a total distance of 11!

Your actual left and right lists contain many location IDs. What is the total distance between your lists?

In [1]:
example = """
3   4
4   3
2   5
1   3
3   9
3   3
"""

expected_output_example = 11
print("Example:\n", example)
print(f"Expected output: {expected_output_example}")

Example:
 
3   4
4   3
2   5
1   3
3   9
3   3

Expected output: 11


## Solution P1

1. Convert the input in list
2. Sort the list
3. Calculate the distances (Difference)
4. Sum the distances
5. Return the total

> Hint: the distance calculation need to be with abs() to avoid negative values

### Approches 

### Basic: 
- Simple list and normal methods
- Use of: strip, split, sort, zip, sum

In [2]:
# 1. Convert the input in list
list_example = example.strip().split("\n")

list_1 = []
list_2 = []

for item in list_example:
    items = item.split("   ")
    list_1.append(int(items[0]))
    list_2.append(int(items[1]))

# 2. Sort the list
list_1.sort()
list_2.sort()

# 3. Calculate the distances
distances = [abs(item_1 - item_2) for item_1, item_2 in zip(list_1, list_2)]

# 4. Sum the distances
total = sum(distances)

# 5. Return the total
print(f"Total sum: {total} \n{'-'*25}\nExpected output: {expected_output_example} \nTotal: {total} \nIf {expected_output_example} == {total} => {expected_output_example == total}")

Total sum: 11 
-------------------------
Expected output: 11 
Total: 11 
If 11 == 11 => True


### Intermediate

- Using pandas 
- Use of: map, df, sort_values, abs, sum

In [3]:
import pandas as pd
from IPython.display import display

# 1. Convert the input in dataframe
data = [list(map(int,item.split("   "))) for item in list_example]

df = pd.DataFrame(data, columns=["items_1", "items_2"])

# 2. Sort values in the dataframe
df["items_1"] = df["items_1"].sort_values(ascending=True).values
df["items_2"] = df["items_2"].sort_values(ascending=True).values

# 3. Calculate the distances
df["distances"] = abs(df["items_1"] - df["items_2"])

# 4. Sum the distances
total = sum(df["distances"])

# 5. Return the total
display(df)
print(f"Total sum: {total} \n{'-'*25}\nExpected output: {expected_output_example} \nTotal: {total} \nIf {expected_output_example} == {total} => {expected_output_example == total}")

Unnamed: 0,items_1,items_2,distances
0,1,3,2
1,2,3,1
2,3,3,0
3,3,4,1
4,3,5,2
5,4,9,5


Total sum: 11 
-------------------------
Expected output: 11 
Total: 11 
If 11 == 11 => True


## Test Solution

Using the file day_1_input.txt return the total sum distances

In [4]:
### BASIC

# 1. Read the txt file and convert into list
list_1 = []
list_2 = []

with open("day_1_input.txt", "r") as file:
    for line in file.readlines():
        list_1.append(int(line.split("   ")[0]))
        list_2.append(int(line.split("   ")[1]))

# 2. Sort the list
list_1.sort()
list_2.sort()

# 3. Calculate the distances
distances = [abs(item_1 - item_2) for item_1, item_2 in zip(list_1, list_2)]

# 4. Sum the distances
total = sum(distances)

# 5. Return the total
print(f"Total sum: {total}")

Total sum: 2264607


In [5]:
# INTERMEDIATE

# 1. Convert the input in dataframe
df = pd.read_csv("day_1_input.txt", sep="   ", header=None, engine='python')
df.columns = ["items_1", "items_2"]
# 2. Sort values in the dataframe
df["items_1"] = df["items_1"].sort_values(ascending=True).values
df["items_2"] = df["items_2"].sort_values(ascending=True).values

# 3. Calculate the distances
df["distances"] = abs(df["items_1"] - df["items_2"])

# 4. Sum the distances
total = sum(df["distances"])

# 5. Return the total
display(df)
print(f"Total sum: {total}")

Unnamed: 0,items_1,items_2,distances
0,10123,10180,57
1,10212,10321,109
2,10332,10684,352
3,10447,10738,291
4,10499,10820,321
...,...,...,...
995,99720,99914,194
996,99742,99914,172
997,99797,99914,117
998,99835,99914,79


Total sum: 2264607


--- Part Two ---

Your analysis only confirmed what everyone feared: the two lists of location IDs are indeed very different.

Or are they?

The Historians can't agree on which group made the mistakes or how to read most of the Chief's handwriting, but in the commotion you notice an interesting detail: a lot of location IDs appear in both lists! Maybe the other numbers aren't location IDs at all but rather misinterpreted handwriting.

This time, you'll need to figure out exactly how often each number from the left list appears in the right list. Calculate a total similarity score by adding up each number in the left list after multiplying it by the number of times that number appears in the right list.

Here are the same example lists again:
```
3   4
4   3
2   5
1   3
3   9
3   3
```

For these example lists, here is the process of finding the similarity score:

- The first number in the left list is 3. It appears in the right list three times, so the similarity score increases by 3 * 3 = 9.
- The second number in the left list is 4. It appears in the right list once, so the similarity score increases by 4 * 1 = 4.
- The third number in the left list is 2. It does not appear in the right list, so the similarity score does not increase (2 * 0 = 0).
- The fourth number, 1, also does not appear in the right list.
- The fifth number, 3, appears in the right list three times; the similarity score increases by 9.
- The last number, 3, appears in the right list three times; the similarity score again increases by 9.
- So, for these example lists, the similarity score at the end of this process is 31 (9 + 4 + 0 + 0 + 9 + 9).

Once again consider your left and right lists. What is their similarity score?

In [6]:
expected_output_example = 31
print("Example:\n", example)
print(f"Expected output: {expected_output_example}")

Example:
 
3   4
4   3
2   5
1   3
3   9
3   3

Expected output: 31


## Solution P2

1. Convert the input in list
2. Count how many cases is the list 1 in list 2, then multiply with the value of list 1
    - (total_item1_in_list2) * item1
4. Sum the appear coincidences
5. Return the total

> Hint: You can loop and use count, create a hash table or count manual

### Approches 

### Basic: 
- Simple list and normal methods
- Use of: strip, split, count, zip, sum

In [7]:
# 1. Convert the input in list
list_example = example.strip().split("\n")

list_1 = []
list_2 = []

for item in list_example:
    items = item.split("   ")
    list_1.append(int(items[0]))
    list_2.append(int(items[1]))

# 2. Count the coincidence in list 1 in list 2
counts = [list_2.count(item_1)*item_1 for item_1 in list_1]

# 4. Sum the distances
total = sum(counts)

# 5. Return the total
print(f"Total sum: {total} \n{'-'*25}\nExpected output: {expected_output_example} \nTotal: {total} \nIf {expected_output_example} == {total} => {expected_output_example == total}")

Total sum: 31 
-------------------------
Expected output: 31 
Total: 31 
If 31 == 31 => True


### Intermediate

- Using pandas 
- Use of: map, df, lambda, apply, sum

In [8]:
# 1. Convert the input in dataframe
data = [list(map(int,item.split("   "))) for item in list_example]

df = pd.DataFrame(data, columns=["items_1", "items_2"])

# 2. Count the coincidence in list 1 in list 2
df["coincidence"] =  df['items_1'].apply(lambda x: (df['items_2'] == x).sum())
df["similarity_score"] = df['coincidence'] * df['items_1']

# 4. Sum the distances
total = sum(df["similarity_score"])

# 5. Return the total
display(df)
print(f"Total sum: {total} \n{'-'*25}\nExpected output: {expected_output_example} \nTotal: {total} \nIf {expected_output_example} == {total} => {expected_output_example == total}")

Unnamed: 0,items_1,items_2,coincidence,similarity_score
0,3,4,3,9
1,4,3,1,4
2,2,5,0,0
3,1,3,0,0
4,3,9,3,9
5,3,3,3,9


Total sum: 31 
-------------------------
Expected output: 31 
Total: 31 
If 31 == 31 => True


## Test Solution P2

Using the file day_1_input.txt return the sum of similarity score

In [9]:
### BASIC

# 1. Read the txt file and convert into list
list_1 = []
list_2 = []

with open("day_1_input.txt", "r") as file:
    for line in file.readlines():
        list_1.append(int(line.split("   ")[0]))
        list_2.append(int(line.split("   ")[1]))

# 2. Count the coincidence in list 1 in list 2
counts = [list_2.count(item_1)*item_1 for item_1 in list_1]

# 4. Sum the distances
total = sum(counts)

# 5. Return the total
print(f"Total sum: {total}")

Total sum: 19457120


In [10]:
### INERMEDIATE

# 1. Convert the input in dataframe
df = pd.read_csv("day_1_input.txt", sep="   ", header=None, engine='python')
df.columns = ["items_1", "items_2"]

# 2. Count the coincidence in list 1 in list 2
df["coincidence"] =  df['items_1'].apply(lambda x: (df['items_2'] == x).sum())
df["similarity_score"] = df['coincidence'] * df['items_1']

# 4. Sum the distances
total = sum(df["similarity_score"])

# 5. Return the total
display(df)
print(f"Total sum: {total}")

Unnamed: 0,items_1,items_2,coincidence,similarity_score
0,27636,67663,0,0
1,92436,51410,0,0
2,68957,77912,0,0
3,36747,51149,0,0
4,30882,77912,0,0
...,...,...,...,...
995,50770,27829,14,710780
996,95962,15656,0,0
997,61245,14960,0,0
998,74434,46710,0,0


Total sum: 19457120
