<a href="https://colab.research.google.com/github/Komal77rao/Data-Eng-Modules/blob/main/1-problem-solving/4-solving-the-problem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 4. Solving the Problem

## Introduction

In the last lesson, we translated our thought process into code.  This led us to the following.

In [2]:
numbers = ["2124443321", "2158861321",
           "8564659988", "3121100845",
           "8564659988", "2124443321"]

organized_nums = {}

# 1. Go through the numbers one by one
for number in numbers:

# 2. If we have not seen the number before,
    if number not in organized_nums.keys():
        # place it in a new pile
        organized_nums[number] = []
        organized_nums[number].append(number)
    else:
        # 3. If we have seen the number before, place it in the pile with the previous number
        organized_nums[number].append(number)


organized_nums

{'2124443321': ['2124443321', '2124443321'],
 '2158861321': ['2158861321'],
 '8564659988': ['8564659988', '8564659988'],
 '3121100845': ['3121100845']}

Of course there is a problem with the above.  We haven't quite produced the output requested.

We have outputted the following:

In [None]:
{'2124443321': ['2124443321', '2124443321'],
 '2158861321': ['2158861321'],
 '8564659988': ['8564659988', '8564659988'],
 '3121100845': ['3121100845']}

And we want to get to this.

In [None]:
{"2124443321": [0, 5], "8564659988": [2, 4]}

In this lesson, we'll work towards the solution.

### Getting to the solution

When we initially solved the problem, we chose a slightly easier problem than the one asked of us.  We placed the matches into piles.  But we still need to:

1. Insert the index of the duplicated elements, and
2. Only show the duplicated elements.

We should tackle these issues one at a time, tackling the *easier* problem first.

### 1. Adding the index

Let's take another look at our code.

In [None]:
numbers = ["2124443321", "2158861321",
           "8564659988", "3121100845",
           "8564659988", "2124443321"]

organized_nums = {}

for number in numbers:
    if number not in organized_nums.keys():
        organized_nums[number] = []
        organized_nums[number].append(number)
    else:
        organized_nums[number].append(number)

It seems that the issue is that instead of appending the `number` we should append the *index* of the number.  The hard part is to find the index of an element as we move through a loop. How do we do that?

Again, let's [ask Google](https://www.google.com/search?q=python+find+index+of+element+in+loop&oq=python+find+index+of+element+in+loop&aqs=chrome..69i57j0i22i30.6909j0j9&sourceid=chrome&ie=UTF-8) how to accomplish this.  

> We see something about the enumerate function.

And then because `enumerate` is somewhat foreign to us, we should quickly practice using it before moving incorporating it into our work.

In [None]:
numbers = ["2124443321", "2158861321",
           "8564659988", "3121100845",
           "8564659988", "2124443321"]

for index, number in enumerate(numbers):
    print(index, number)

0 2124443321
1 2158861321
2 8564659988
3 3121100845
4 8564659988
5 2124443321


Ok, so enumerate returns two block variables to us, of the index and then the element.  Let's update our code.

In [None]:
numbers = ["2124443321", "2158861321",
           "8564659988", "3121100845",
           "8564659988", "2124443321"]

organized_nums = {}
for index, number in enumerate(numbers):
    if number not in organized_nums.keys():
        organized_nums[number] = []
        organized_nums[number].append(index)
    else:
        organized_nums[number].append(index)

In [None]:
organized_nums

{'2124443321': [0, 5],
 '2158861321': [1],
 '8564659988': [2, 4],
 '3121100845': [3]}

And let's again compare our result with the solution expected of us.

In [None]:
{"2124443321": [0, 5], "8564659988": [2, 4]}

### 2. Only include duplicate elements

Ok, so we have only one step left, and that is to only keep the duplicate elements.  We essentially have two options:

1. Pre-processing
2. Post-processing

1. Pre-processing

With preprocessing, before we loop through the data, we would first only loop through numbers that are duplicated.  Then for each of the duplicated numbers we would add their index to the dictionary.

2. Post-processing

With post-processing we take our dictionary above and create a new dictionary with key value pairs where there are multiple elements.

Here post-processing seems easier.  So let's go with that.  Notice that we can use our problem solving techniques to accomplish this.  We'll copy the input and desired output below.

In [None]:
organized_nums = {'2124443321': [0, 5],
 '2158861321': [1],
 '8564659988': [2, 4],
 '3121100845': [3]}

# solution -> {"2124443321": [0, 5], "8564659988": [2, 4]}

And then we can solve this by only selecting those attributes that have a list with more than one element in it, like so.

In [None]:
{k:v for k, v in organized_nums.items() if len(v) > 1}

{'2124443321': [0, 5], '8564659988': [2, 4]}

Now let's put all of our code together and wrap it in the function `find_repeat`.

In [None]:
numbers = ["2124443321", "2158861321",
           "8564659988", "3121100845",
           "8564659988", "2124443321"]

def find_repeat(numbers):
    organized_nums = {}
    for index, number in enumerate(numbers):
        if number not in organized_nums.keys():
            organized_nums[number] = []
            organized_nums[number].append(index)
        else:
            organized_nums[number].append(index)
    return {k:v for k, v in organized_nums.items() if len(v) > 1}

find_repeat(numbers)

{'2124443321': [0, 5], '8564659988': [2, 4]}

And we are done.

### Summary

In this lesson, we wrote the code to produce the desired output.  We used our problem solving technique of copying over the starting point and the ending point, and began with inserting the index into our list.  When we learned something new with Google (enumerate), we first tried it out on it's own before incorporating it into the larger function.  

With our step of only selecting items that have more than one attribute, we saw that we could accomplish this through pre-processing or post-processing.  With pre-processing we would first remove any values that are duplicates, and then create the dictionary.  And with post-processing, which we chose, we removed our duplicate items after the dictionary was created.  We chose the technique which we found easier.