First, we make the list of five digit primes since these are the possible guesses we can make in Primel and put it as ````prime_list````.

In [1]:
import pandas as pd
from math import *

In [2]:
five_digit_primes = [j for j in range(10000, 100001) if all(j % i != 0 for i in range(2, j//2))] 

In [3]:
(pd.Series(five_digit_primes, name = 'primes')).to_csv('prime_list.csv')
len(five_digit_primes)

8363

In [17]:
prime_list = pd.read_csv('prime_list.csv', index_col = 0).primes
prime_list.iloc[0:10] # the first 10 five digit primes

0    10007
1    10009
2    10037
3    10039
4    10061
5    10067
6    10069
7    10079
8    10091
9    10093
Name: primes, dtype: int64

Next, we segment each prime into individual digits to then compare the digits of a prime guess we choose to other primes.

We want to get the probability of each outcome (combination of greens, yellows, and greys) that a guess can produce (we will use these to get entropy of this guess later). To do this, we first compare our guess to each prime in the list to get the outcome that is produced when compared to each of them, and we encode this with 2 = green, 1 = yellow, and 0 = grey in ````get_guess_comparison````. Then, we will count the amount of times that outcome can appear in the whole prime list in ````get_count_bucket```` so we can get the probability of each outcome in ````get_probability_bucket````

For example, if we guess the number 10007, one outcome would be [GREEN, GREY, GREY, GREY, GREY], so we look for all the numbers that fit that outcome, having a 1 in the start, then we divide that by 8363 to get the probability of seeing that outcome. Since 1033 five digit primes start with 1, the probability of seeing this outcome is 1033/8363, or around 12.4%. 

In [6]:
def get_guess_comparison(prime_guess, prime_list): 
    
    # Compare our guess to each prime in the whole list for yellow, green, or grey
    # Returns a list of all the primes with a 2 for greens (same number in same spot), 
    # 1 for yellows (number in that spot exists in diff spot)
    # and 0 for greys (number in that spot isnt in either number)
    
    updated_prime_list= [[0,0,0,0,0] for p in range(0, len(prime_list))]
    
    # Split guess into digits
    guess_digits = [(prime_guess//10**(5-digit))%10 for digit in range(1, 6)] 

    for i, prime in enumerate(prime_list):
        
        # Split each prime into individual digits
        prime_digits = [(prime//10**(5-digit))%10 for digit in range(1, 6)]
        
        for j, prime_digit in enumerate(prime_digits):
            guess_digit = guess_digits[j]
            location = str(prime).find(str(guess_digit))
            # Put a 2 in new prime list if guess digit matches prime digit (green)
            if (prime_digit == guess_digit): 
                updated_prime_list[i][j] = 2
            elif(location != -1): # does the guess digit exist inside that prime? 
                # Is that digit not already green (2)?
                # If so, turn it yellow (1); 
                updated_prime_list[i][location] = 1 if updated_prime_list[i][location] != 2 else 2
    return updated_prime_list

A result from calling this function may look like the below, which pretends the guess is 10007.

In [7]:
for i, p in enumerate(get_guess_comparison(prime_list[0], prime_list)[0:10]):
    print(prime_list[i], p, i) 

10007 [2, 2, 2, 2, 2] 0
10009 [2, 2, 2, 2, 0] 1
10037 [2, 2, 2, 0, 2] 2
10039 [2, 2, 2, 0, 0] 3
10061 [2, 2, 2, 0, 0] 4
10067 [2, 2, 2, 0, 2] 5
10069 [2, 2, 2, 0, 0] 6
10079 [2, 2, 2, 1, 0] 7
10091 [2, 2, 2, 0, 0] 8
10093 [2, 2, 2, 0, 0] 9


In [10]:
def get_outcome_list():
    # Get all possible outcomes of green, yellow, and grey combinations, 3^5 possibilities

    outcomes = []

    for j in range(0, 3): 
        for k in range(0, 3):
            for l in range(0, 3):
                for m in range(0, 3):
                    for n in range(0, 3):
                        outcomes.append([j, k, l, m, n])
    return outcomes

In [11]:
outcomes = get_outcome_list() # saving outcomes for use throughout the notebook

In [13]:
def get_count_bucket(new_prime_list):     
    # Count the amount of times every outcome happens for a given guess comparison list (2,1,0 options)
    
    count_bucket = [0 for i in range(0, 3**5)]

    for prime in new_prime_list:
        for j, outcome in enumerate(outcomes):
            if (prime == outcome):
                count_bucket[j] += 1
                break
    return count_bucket

In [20]:
def get_probability_bucket(count_bucket, prime_list):   
    # Get probabilities of each outcome
    
    probability_bucket = [-1 for i in range(0, len(count_bucket))] 
    
    for i in range(0, len(probability_bucket)):
        probability_bucket[i] = count_bucket[i]/len(prime_list)
        
    return probability_bucket

Finally, we can get the entropy for this guess via the well known formula for entropy $\sum_x -p(x)\log_2(\frac{1}{p(x)})$ , where the log is the information gained and p(x) is the probability. Summing this culmulative probability distribution will give the amount of times on average the outcome will split the data in half.


In [31]:
def get_entropy(probability_bucket):
    entropy = 0
    for probability in probability_bucket:
        if probability != 0:
            entropy += probability*log(1/probability, 2)
    return entropy

Lets look at the entropy for the first number in our list, 10007. How much bits of information do we get from it?

In [32]:
guess_comparison = get_guess_comparison(prime_list[0], prime_list)
count_bucket = get_count_bucket(guess_comparison)
probability_bucket = get_probability_bucket(count_bucket, prime_list)
get_entropy(probability_bucket)

5.17643718011112

What is the maximum entropy needed to be certain we can guess the correct prime? To get this, we divide the prime list by 2 until we get to 1, as done below with a log:

In [33]:
total_entropy = log(len(prime_list), 2)
total_entropy

13.029804847630698

We need to split the data in half 13.03 times to be certain we have a solution (we have all information we can possibly get). What is the best entropy we could get for a first guess? If every outcome has equal liklihood, we will be able to split the data in half the most amount of times from any guess, so this will give the maximum entropy, which is calculated below

In [34]:
uniform_probability_bucket_two = [0 for i in range(0,3**5)]
for i in range(0,3**5):
    uniform_probability_bucket_two[i] = 1/3**5
best_entropy = get_entropy(uniform_probability_bucket_two)
best_entropy

7.924812503605761

So the best possible guess would be one with entropy 7.92, dividing 8363 $2^{7.92}$ times, if it were possible to have every outcome equally likely.

Finally, lets look for the number with the most entropy with the max() function after looping through each prime number as a guess and comparing that prime to all other primes in the list, storing each entropy value in a list while calling this nested function

In [None]:
entropy_list=[]
for prime in prime_list:
    guess_comparison = get_guess_comparison(prime, prime_list)
    count_bucket = get_count_bucket(guess_comparison)
    probability_bucket = get_probability_bucket(count_bucket, prime_list)
    entropy_list.append(probability_bucket)

In [216]:
best_first_entropy = max(entropy_list)
for i, entropy in enumerate(entropy_list):
    if (best_first_entropy == entropy:
        print("The number with the best entropy is {} with entropy {}!".format(prime_list[i], entropy_list[i]), '\n')
        
entropy_csv = pd.DataFrame({'entropy': entropy_list}, index = prime_list).sort_values(by = 'entropy', ascending = False)
entropy_csv.to_csv('entropy_list_one.csv')

The number with the best entropy is 10273 with entropy 6.632153265603857! 



ValueError: Length of values (69) does not match length of index (8363)

We got the number with the best entropy as a first guess, but we are only factoring information from a single guess. We can also factor in the information obtained from a second guess after the first one. To do this, we look at an outcome for our first guess, then see the possible primes that have that outcome. After that, we use that smaller list of primes and see the probability of getting each given our first outcome (divide by the length of smaller prime list), then get the entropy and add it to the first outcome.

In [20]:
def get_primes_in_each_outcome(prime_guess, prime_list):
    comparison_primes = pd.Series(get_guess_comparison(prime_guess, prime_list))

    actual_primes = [[] for i in range(0, len(outcomes))]
    
    for i, comparison_prime in enumerate(comparison_primes):
        location = outcomes.index(comparison_prime)
        if location != -1:
            prime = prime_list[i]
            actual_primes[location].append(prime)
            
    actual_primes = pd.Series(actual_primes, name = 'actual_primes')
    return actual_primes

In [21]:
get_primes_in_each_outcome(prime_list[0], prime_list)

0      [22229, 22259, 22283, 22343, 22349, 22369, 224...
1      [22291, 22381, 22391, 22441, 22481, 22531, 225...
2      [22247, 22277, 22367, 22397, 22447, 22567, 226...
3      [22273, 22279, 22511, 22573, 22613, 22619, 226...
4      [22271, 22571, 22871, 23371, 23671, 23971, 243...
                             ...                        
238                                                   []
239                                                   []
240                                              [10009]
241                                                   []
242                                              [10007]
Name: actual_primes, Length: 243, dtype: object

Now, we get the probability of the next outcome after the first, so we look at each outcome's possibilities after the first guess and from that list of possibilities, we get the probability of picking the next one.

Ideally, we would look at the information of a second guess for all possible primes, but this is very long computation, so I take the top entropies for a first guess and look at how those informations change when we look at a second guess.

In [22]:
# Get actual amount of info for a particular outcome (not average) for a guess
def get_info_for_outcome(prime_guess, prime_list):
    
    counts = get_count_bucket(get_guess_comparison(prime_guess, prime_list))
    information_gained = ["NaN" for i in range(0, len(outcomes))] 
    for i, outcome in enumerate(outcomes):
        if (counts[i] != 0): # if outcome is not possible for that guess, return NaN
            information_gained[i] = log(len(prime_list)/counts[i], 2) 
            # for one possibility left in outcome, you already have all info, so returns 0
    return information_gained


To get the max entropies for each second possible outcome, the code is below:

In [23]:
def get_second_entropy_dataframe(top_prime_list, first_entropy_series):
    # Gets the second entropy for each possible outcome of a first prime, aka 243 data points for each prime
    # finds the best second prime for every possible first guess outcome
    
    empty_list = [[] for l in range(0, len(outcomes))]
    for j, prime in enumerate(top_prime_list): # Go through each prime to be the first guess
        first_guess_entropy = first_entropy_series.loc[prime][0]
        first_guess_comparison = get_guess_comparison(prime, prime_list)
        first_guess_probabilty_list = get_probability_bucket(get_count_bucket(first_guess_comparison), prime_list)
        if j == 0:
            top_dataframe = pd.DataFrame({"first_prime": [prime for l in range(0, len(outcomes))] ,
                                            "first_outcome": outcomes,
                                            "corresponding_second_prime": empty_list,
                                            "first_info": get_info_for_outcome(prime, prime_list), # uses full prime list
                                            "first_entropy": [first_guess_entropy for l in range(0, len(outcomes))],
                                            "second_entropy": empty_list,
                                            "total_entropy": empty_list,
                                            "first_outcome_probability": first_guess_probabilty_list})
        else:
            next_dataframe = pd.DataFrame({"first_prime": [prime for l in range(0, len(outcomes))] ,
                                            "first_outcome": outcomes,
                                            "corresponding_second_prime": empty_list,
                                            "first_info": get_info_for_outcome(prime, prime_list),
                                            "first_entropy": [first_guess_entropy for l in range(0, len(outcomes))],
                                            "second_entropy": empty_list,
                                            "total_entropy": empty_list,
                                            "first_outcome_probability": first_guess_probabilty_list,})
            
        for i, outcome in enumerate(outcomes): # Look at each possible outcome for that first guess
            prime_options = get_primes_in_each_outcome(prime, prime_list)[i] # Gets actual prime possibilites in that outcome
            second_entropy_possibilities = []

            for k in range(0, len(prime_options)): # Get entropy of that outcome
                compared_list = get_guess_comparison(prime_options[k], prime_options)
                counted_outcomes = get_count_bucket(compared_list)
                probability = get_probability_bucket(counted_outcomes, prime_options)
                entropy = get_entropy(probability)
                second_entropy_possibilities.append(entropy)
                
            if j == 0:  # create top data frame
                if(second_entropy_possibilities != []): # Check that its not empty so we can take a max
                    best_second_entropy = max(second_entropy_possibilities) # Gets best second entropy for this outcome
                    index = second_entropy_possibilities.index(best_second_entropy)
                    top_dataframe.corresponding_second_prime[i] = prime_options[index]
                    top_dataframe.second_entropy[i] = best_second_entropy
                else: # First outcome is not possible with first guess 
                    top_dataframe.second_entropy[i] = 0
                    top_dataframe.corresponding_second_prime[i] = "NaN" 
                top_dataframe.total_entropy[i] = top_dataframe.first_entropy[i] + top_dataframe.second_entropy[i]
            else: # creates next data frame to add to top dataframe
                if(second_entropy_possibilities != []):
                    best_second_entropy = max(second_entropy_possibilities)
                    index = second_entropy_possibilities.index(best_second_entropy)
                    next_dataframe.corresponding_second_prime[i] = prime_options[index]
                    next_dataframe.second_entropy[i] = best_second_entropy
                else:
                    next_dataframe.second_entropy[i] = 0
                    next_dataframe.corresponding_second_prime[i] = "NaN"
                next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
                
        if j != 0:
            top_dataframe = top_dataframe.append(next_dataframe, ignore_index=True)
            
        print("RESULT OF THIS ITERATION: ", j, top_dataframe)
    return top_dataframe

In [24]:
first_top_entropies= pd.read_csv('entropy_list_one', index_col = 0)
top_primes = first_top_entropies.index

In [None]:
for df in range(0, len(prime_list), 100): # Saving dataframes 100 primes at a time (takes about an hr each)
    if df < 8300:
        dataframe_x = get_second_entropy_dataframe(top_primes[df:df + 100], first_top_entropies)
        dataframe_x.to_csv("{}to{}_secondentropy_df".format(df, df + 99))
        print('\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Current segment: {} to {}'.format(df+100,df+199))
    else: # 8363 primes, this is for the last 63
        dataframe_x = get_second_entropy_dataframe(top_primes[8300:len(top_primes)], first_top_entropies)
        dataframe_x.to_csv("8300to{}_secondentropy_df".format(len(top_primes)))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_dataframe.total_entropy[i] = top_dataframe.first_entropy[i] + top_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_gu

RESULT OF THIS ITERATION:  0      first_prime    first_outcome corresponding_second_prime first_info  \
0          72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1          72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2          72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3          72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4          72227  [0, 0, 0, 1, 1]                        NaN        NaN   
..           ...              ...                        ...        ...   
238        72227  [2, 2, 2, 1, 1]                        NaN        NaN   
239        72227  [2, 2, 2, 1, 2]                        NaN        NaN   
240        72227  [2, 2, 2, 2, 0]                      72221  11.444842   
241        72227  [2, 2, 2, 2, 1]                        NaN        NaN   
242        72227  [2, 2, 2, 2, 2]                      72227  13.029805   

     first_entropy second_entropy total_entropy  first_outcome_probabi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  1      first_prime    first_outcome corresponding_second_prime first_info  \
0          72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1          72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2          72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3          72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4          72227  [0, 0, 0, 1, 1]                        NaN        NaN   
..           ...              ...                        ...        ...   
481        32233  [2, 2, 2, 1, 1]                        NaN        NaN   
482        32233  [2, 2, 2, 1, 2]                        NaN        NaN   
483        32233  [2, 2, 2, 2, 0]                      32237  13.029805   
484        32233  [2, 2, 2, 2, 1]                        NaN        NaN   
485        32233  [2, 2, 2, 2, 2]                      32233  13.029805   

     first_entropy second_entropy total_entropy  first_outcome_probabi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  2      first_prime    first_outcome corresponding_second_prime first_info  \
0          72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1          72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2          72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3          72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4          72227  [0, 0, 0, 1, 1]                        NaN        NaN   
..           ...              ...                        ...        ...   
724        33223  [2, 2, 2, 1, 1]                        NaN        NaN   
725        33223  [2, 2, 2, 1, 2]                        NaN        NaN   
726        33223  [2, 2, 2, 2, 0]                        NaN        NaN   
727        33223  [2, 2, 2, 2, 1]                        NaN        NaN   
728        33223  [2, 2, 2, 2, 2]                      33223  13.029805   

     first_entropy second_entropy total_entropy  first_outcome_probabi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  3      first_prime    first_outcome corresponding_second_prime first_info  \
0          72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1          72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2          72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3          72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4          72227  [0, 0, 0, 1, 1]                        NaN        NaN   
..           ...              ...                        ...        ...   
967        27277  [2, 2, 2, 1, 1]                        NaN        NaN   
968        27277  [2, 2, 2, 1, 2]                        NaN        NaN   
969        27277  [2, 2, 2, 2, 0]                      27271  13.029805   
970        27277  [2, 2, 2, 2, 1]                        NaN        NaN   
971        27277  [2, 2, 2, 2, 2]                      27277  13.029805   

     first_entropy second_entropy total_entropy  first_outcome_probabi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  4       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
1210        72277  [2, 2, 2, 1, 1]                        NaN        NaN   
1211        72277  [2, 2, 2, 1, 2]                        NaN        NaN   
1212        72277  [2, 2, 2, 2, 0]                      72271  13.029805   
1213        72277  [2, 2, 2, 2, 1]                        NaN        NaN   
1214        72277  [2, 2, 2, 2, 2]                      72277  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  5       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
1453        22777  [2, 2, 2, 1, 1]                        NaN        NaN   
1454        22777  [2, 2, 2, 1, 2]                        NaN        NaN   
1455        22777  [2, 2, 2, 2, 0]                        NaN        NaN   
1456        22777  [2, 2, 2, 2, 1]                        NaN        NaN   
1457        22777  [2, 2, 2, 2, 2]                      22777  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  6       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
1696        55511  [2, 2, 2, 1, 1]                        NaN        NaN   
1697        55511  [2, 2, 2, 1, 2]                        NaN        NaN   
1698        55511  [2, 2, 2, 2, 0]                        NaN        NaN   
1699        55511  [2, 2, 2, 2, 1]                        NaN        NaN   
1700        55511  [2, 2, 2, 2, 2]                      55511  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  7       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
1939        53353  [2, 2, 2, 1, 1]                        NaN        NaN   
1940        53353  [2, 2, 2, 1, 2]                        NaN        NaN   
1941        53353  [2, 2, 2, 2, 0]                      53359  13.029805   
1942        53353  [2, 2, 2, 2, 1]                        NaN        NaN   
1943        53353  [2, 2, 2, 2, 2]                      53353  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  8       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
2182        72727  [2, 2, 2, 1, 1]                        NaN        NaN   
2183        72727  [2, 2, 2, 1, 2]                        NaN        NaN   
2184        72727  [2, 2, 2, 2, 0]                        NaN        NaN   
2185        72727  [2, 2, 2, 2, 1]                        NaN        NaN   
2186        72727  [2, 2, 2, 2, 2]                      72727  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  9       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
2425        55333  [2, 2, 2, 1, 1]                        NaN        NaN   
2426        55333  [2, 2, 2, 1, 2]                        NaN        NaN   
2427        55333  [2, 2, 2, 2, 0]                      55331  11.444842   
2428        55333  [2, 2, 2, 2, 1]                        NaN        NaN   
2429        55333  [2, 2, 2, 2, 2]                      55333  13.029805   

      first_entropy second_entropy total_entropy  first_ou

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  10       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
2668        35353  [2, 2, 2, 1, 1]                        NaN        NaN   
2669        35353  [2, 2, 2, 1, 2]                        NaN        NaN   
2670        35353  [2, 2, 2, 2, 0]                        NaN        NaN   
2671        35353  [2, 2, 2, 2, 1]                        NaN        NaN   
2672        35353  [2, 2, 2, 2, 2]                      35353  13.029805   

      first_entropy second_entropy total_entropy  first_o

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  11       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
2911        18181  [2, 2, 2, 1, 1]                        NaN        NaN   
2912        18181  [2, 2, 2, 1, 2]                        NaN        NaN   
2913        18181  [2, 2, 2, 2, 0]                        NaN        NaN   
2914        18181  [2, 2, 2, 2, 1]                        NaN        NaN   
2915        18181  [2, 2, 2, 2, 2]                      18181  13.029805   

      first_entropy second_entropy total_entropy  first_o

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.corresponding_second_prime[i] = prime_options[index]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.second_entropy[i] = best_second_entropy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  next_dataframe.total_entropy[i] = next_dataframe.first_entropy[i] + next_dataframe.second_entropy[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

RESULT OF THIS ITERATION:  12       first_prime    first_outcome corresponding_second_prime first_info  \
0           72227  [0, 0, 0, 0, 0]                      16493   1.750775   
1           72227  [0, 0, 0, 0, 1]                        NaN        NaN   
2           72227  [0, 0, 0, 0, 2]                      13697   3.366247   
3           72227  [0, 0, 0, 1, 0]                      56179   4.830133   
4           72227  [0, 0, 0, 1, 1]                        NaN        NaN   
...           ...              ...                        ...        ...   
3154        35533  [2, 2, 2, 1, 1]                        NaN        NaN   
3155        35533  [2, 2, 2, 1, 2]                        NaN        NaN   
3156        35533  [2, 2, 2, 2, 0]                      35531  12.029805   
3157        35533  [2, 2, 2, 2, 1]                        NaN        NaN   
3158        35533  [2, 2, 2, 2, 2]                      35533  13.029805   

      first_entropy second_entropy total_entropy  first_o

Next, we will take each prime in this dataframe and average the total entropy (and/or second entropy) to get the number with the best total entropy on average despite its first outcome. if we took the max of second entropy, that wouldn't be very helpful because that entropy only occurs if we get a certain outcome (see below). 

To get a best second guess, we have this data set already, so once we find the best first guess, we will take the max second_entropy of that prime to see which prime would be best to pick next for the most information. The dataframe doesn't include best theoretical second entropy (putting all primes in the list to have uniform probability) because it depends on the outcome and what possible outcomes are left, although we could do this, this statistic doesnt really matter at this point (you could in the future without too much difficulty, then average all the outcomes best theoretical to get a overall best theoretical entropy for each prime, but this isnt useful for generating an actual second guess).

To get the best first guess with information about second guesses, we use the average of all outcomes entropy to see which first guess gives the most information on average despite its outcome (some outcomes arent possible-- simply remove the NaN rows to not consider and mess up results). ISNTEAD, TAKE THE PROBABILITY OF EACH OUTCOME OF THE 234 FOR THE FIRST PRIME AND SUM EACH PROBABILITY*SECOND_ENTROPY -- this will give you the second entropy you will most likely get, to which we can get the total entropy we expect to get from that prime by adding first+second (if i were to do this for a third guess, id take the second entropy probabilities for each outcome and sum the third entropy times that probability

you would expect a prime like 53917 to be best because it gives every possible end digit, [0,0,0,0,0] is not possible, but no.

In [29]:
full_data = pd.DataFrame({})

for df_num in range(0, len(top_primes), 100):
    
    if df_num < 8300:
        temp_df = pd.read_csv("{}to{}_secondentropy_df".format(df_num, df_num + 99), index_col = 0)
    else:
        temp_df = pd.read_csv("8300to{}_secondentropy_df".format(len(top_primes)), index_col = 0)
        
    full_data = full_data.append(temp_df, ignore_index = True)
    
full_data.to_csv("full_primel_data")

  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.

  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.

  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.append(temp_df, ignore_index = True)
  full_data = full_data.

In [26]:
full_data

Unnamed: 0,first_prime,first_outcome,corresponding_second_prime,first_info,first_entropy,second_entropy,total_entropy,first_outcome_probability
0,12539,"[0, 0, 0, 0, 0]",70487.0,6.196915,6.701542,5.572470,12.274012,0.013631
1,12539,"[0, 0, 0, 0, 1]",80473.0,5.315559,6.701542,6.509450,13.210992,0.025111
2,12539,"[0, 0, 0, 0, 2]",90679.0,5.099068,6.701542,5.666062,12.367604,0.029176
3,12539,"[0, 0, 0, 1, 0]",67411.0,6.196915,6.701542,5.573775,12.275317,0.013631
4,12539,"[0, 0, 0, 1, 1]",68491.0,5.637487,6.701542,6.211996,12.913538,0.020088
...,...,...,...,...,...,...,...,...
2032204,88883,"[2, 2, 2, 1, 1]",,,3.868239,0.000000,3.868239,0.000000
2032205,88883,"[2, 2, 2, 1, 2]",,,3.868239,0.000000,3.868239,0.000000
2032206,88883,"[2, 2, 2, 2, 0]",,,3.868239,0.000000,3.868239,0.000000
2032207,88883,"[2, 2, 2, 2, 1]",,,3.868239,0.000000,3.868239,0.000000


In [27]:
full_data["prob_times_second_entropy"] = full_data.first_outcome_probability * full_data.second_entropy

data_no_nan = full_data.dropna() # Dropping impossible outcomes for each number

# Get prime with most entropy as a 2nd guess for each number in prime list 
second_prime_index = data_no_nan.groupby("first_prime").second_entropy.idxmax()
best_second_guesses = data_no_nan.loc[second_prime_index].corresponding_second_prime

best_second_guesses


1880577    26539.0
1876203    26573.0
1017442    86923.0
995572     87523.0
1987983    27893.0
            ...   
2030265    14753.0
1846800    72503.0
1723357    25639.0
2030994    12743.0
2012769    26573.0
Name: corresponding_second_prime, Length: 8363, dtype: float64

In [28]:
full_entropy_list = data_no_nan.groupby(["first_prime", "first_entropy"]).prob_times_second_entropy.sum().reset_index().rename(columns ={"prob_times_second_entropy":"avg_second_entropy"})
full_entropy_list["total_entropy"] = full_entropy_list.first_entropy + full_entropy_list.avg_second_entropy

full_entropy_list["second_prime"] = best_second_guesses.reset_index().corresponding_second_prime

top_entropies = full_entropy_list.sort_values(by = "total_entropy", ascending = False, ignore_index = True)
top_entropies.to_csv("final_entropies_and_best_guesses")
top_entropies

Unnamed: 0,first_prime,first_entropy,avg_second_entropy,total_entropy,second_prime
0,28571,6.490827,5.136506,11.627333,40933.0
1,29873,6.674484,4.950826,11.625310,51047.0
2,58379,6.645493,4.978990,11.624482,10243.0
3,27539,6.682685,4.938994,11.621679,10847.0
4,67819,6.650968,4.970692,11.621660,20357.0
...,...,...,...,...,...
8358,77773,4.344207,5.071487,9.415693,28619.0
8359,10111,3.935964,5.451477,9.387441,27893.0
8360,22229,3.925175,5.421066,9.346241,14753.0
8361,44449,3.886362,5.434456,9.320818,35617.0


So the number with the best total entropy is 28571 with total entropy 11.627 and to get the most information after that number ignoring your outcome result, you should pick 40933 on average (even though this number may not be possible given certain outcomes). oddly, this first entropy for this number is only 6.49, a whole .2 away from 12539, but makes up for it with the amount of information you can gain from a second guess.

Critiques: Could do a third guess for most optimal entropy for each prime, but that would lead to 2million * 3^5 datapoints, which is just too much. At a third guess, we would ideally stop looking for info and start guessing (factoring in both outcomes and only looking at numbers that work with the info we're given. We would be guessing on 2-3 primes on average with the best first and second guess given in the dataframe, 28571 then 40933. If we wanted to factor the outcome for a 2nd guess, we would look at the best prime for that outcome given in the full data frame.
- could also look at mathematical concepts somehow to improve the algorithm
- could ask algorithm what number they chose, the outcome they got, then give user what number they should guess next dynamically instead of looking at all outcomes at once for each number, but that would not lead to a result giving the best first guess
- could generalize this code to work for any wordle variant that takes one guess at a time by removing "prime" everywhere, but to use this as a different wordle variant, they would have to change the function "get_guess_outcomes" to work with that variant, but thats all that needs to be changed.
- could use dictionaries instead of nested lists all around to map values easier