# Intro to the problem and my approach to solving it



Einstein's riddle is a famous logic problem, in which there are 5 houses of unknown but fixed positions relative to each other, each with a unique colour, and each with a resident who has a unique nationality, pet, and drink and cigarette preference.

A series of insights are given to help you solve the puzzle.

You can read the full scope of the problem here - https://www.brainzilla.com/logic/zebra/einsteins-riddle/

My aim is to solve the problem by gearing towards a dataframe with 5 rows (1 for each house), and 6 columns (1 for each of position, colour, drink, pet, nationality, cigar preference).

I aim to start by creating a dataframe of all possible rows (5**6 = 15625), and using the insights I have been given as part of the problem to filter out as many impossible rows as I can.

For instance, many of the insights given to solve the problem are of the form:
​- ​The Brit lives in the Red house.

Such insights will allow me to delete rows that violate this insight - (i.e. all rows where nationality == British, but House != Red ; and all rows where House == Red, but nationality != British).




# Create df with all possible combinations

In [None]:
import numpy as np
import pandas as pd

In [None]:
Houses = [1,2,3,4,5]
Colours = ["red", "blue", "green", "white", "yellow"]
Drinks = ["beer", "coffee", "milk", "tea", "water"]
Cigs = ["Blends", "Blue Master", "Durnhill", "Pall Mall", "Prince"]
Pets= ["bird", "cat", "dog", "horse", "fish"]
Nationalities = ["British","Danish","German","Norwegian", "Swedish"]



array =   [(a,b,c,d,e,f) for a in Houses for b in Colours for c in Drinks for d in Cigs for e in Pets for f in Nationalities]

Master_df = pd.DataFrame(array, columns = ['House', 'Colour', 'Drink', 'Cig', 'Pet', 'Nationality'])

Master_df.head()

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality
0,1,red,beer,Blends,bird,British
1,1,red,beer,Blends,bird,Danish
2,1,red,beer,Blends,bird,German
3,1,red,beer,Blends,bird,Norwegian
4,1,red,beer,Blends,bird,Swedish


In [None]:
len(Master_df)

15625

In [None]:
5**6

15625

In [None]:
Master_df['Id'] = np.arange(0, len(Master_df))


# First Phase of filtering rows

Many of the insights are of the form:

"The Brit lives in the Red House"

Which means we can remove all rows that have:
Nationality == British, Colour!= Red
And all rows that have:
Nationality != British, Colour== Red

Below are a list of all insights that follow this pattern, which I will call pattern A.

My plan is to create a function that filters according to pattern A. The function will take in the variables-values refered to in the insight, and output a filtered df.

I will filter sequentially, using one insight at a time.

​Pattern A insights:

​The Brit lives in the Red house.

The Swede keeps Dogs as pets.

The Dane drinks Tea.

The owner of the Green house drinks Coffee.

The person who smokes Pall Mall rears Birds.

The owner of the Yellow house smokes Dunhill.

The man living in the centre house drinks Milk.

The Norwegian lives in the first house.

The man who smokes Blue Master drinks Beer.

The German smokes Prince.

**The following pattern A style statements can also be deduced**

The Second House is Blue (because Norwegian, in house 1, lives next to Blue House)



In [None]:
#Pattern A filter takes in information equivalent to that of the form:
# "The Dane Drinks Tea"
#And delets rows where Nationality == Dane, Drink != Tea ; Nationality != Dane, Drink == Tea

def reduce_using_pattern_A(df, variable_1_name, value_1_name, variable_2_name, value_2_name):

  og_df = df.copy(deep = True)

  #create 2 dataframes made up of the rows that violate the condition from pattern A

  colA = og_df[variable_1_name]
  colB = og_df[variable_2_name]

  condition_1 = np.logical_and(colA == value_1_name, colB != value_2_name)

  #e.g. a dataframe of all rows where nationality is danish, and drink is not tea
  df_to_remove_1 = og_df[condition_1]

  condition_2 = np.logical_and(colA != value_1_name, colB == value_2_name)


  #e.g. a dataframe where drink is tea and nationality is not danish
  df_to_remove_2 = og_df[condition_2]

  df_to_remove = pd.concat([df_to_remove_1, df_to_remove_2], axis = 0)

  filtered_df = og_df[og_df.Id.isin(df_to_remove.Id)== False]

  return filtered_df

In [None]:
filtered_1 = reduce_using_pattern_A(Master_df, "Nationality", "British", "Colour", "red")

filtered_2 = reduce_using_pattern_A(filtered_1, "Nationality", "Swedish", "Pet", "dog")

filtered_3 = reduce_using_pattern_A(filtered_2, "Nationality", "Danish", "Drink", "tea")

filtered_4 = reduce_using_pattern_A(filtered_3, "Colour", "green", "Drink", "coffee")

filtered_5 = reduce_using_pattern_A(filtered_4, "Cig", "Pall Mall", "Pet", "bird")

filtered_6 = reduce_using_pattern_A(filtered_5, "Colour", "yellow", "Cig", "Durnhill")

filtered_7 = reduce_using_pattern_A(filtered_6, "House", 3, "Drink", "milk")

filtered_8 = reduce_using_pattern_A(filtered_7, "House", 1, "Nationality", "Norwegian")

filtered_9 = reduce_using_pattern_A(filtered_8, "Cig", "Blue Master", "Drink", "beer")

filtered_10 = reduce_using_pattern_A(filtered_9, "Nationality", "German", "Cig", "Prince")

filtered_11 = reduce_using_pattern_A(filtered_10, "House", 2, "Colour", "blue")


df_after_phase_1 = filtered_11.copy(deep = True)

df_after_phase_1






Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
1383,1,green,coffee,Blends,cat,Norwegian,1383
1393,1,green,coffee,Blends,horse,Norwegian,1393
1398,1,green,coffee,Blends,fish,Norwegian,1398
1453,1,green,coffee,Pall Mall,bird,Norwegian,1453
1908,1,white,beer,Blue Master,cat,Norwegian,1908
...,...,...,...,...,...,...,...
14997,5,white,water,Prince,fish,German,14997
15431,5,yellow,tea,Durnhill,cat,Danish,15431
15441,5,yellow,tea,Durnhill,horse,Danish,15441
15446,5,yellow,tea,Durnhill,fish,Danish,15446


In [None]:
len(df_after_phase_1)

80

# Second Phase of filtering rows

The remaining insights are of the form:

"The Green house is exactly to the left of the White house"

Before using the full information stored in such insights, I will translate them into simpler insights, so as to perform another round of filtering out impossible rows.

Simpler insights:

- The green house is not house 5
- The white house is not house 1
(where 1 is defined as leftmost house, 5 as rightmost - houses in straight line not circular order)

I will create a function to filter possible rows based on this pattern, pattern B. And then use function with all relevant insights sequentially.

Pattern B insights:

The Green house is not house 5/house 5 not green
(The Green house is exactly to the left of the White house.)


The white house is not house 1/house 1 not white
(The Green house is exactly to the left of the White house.)


The householder who smokes Blends does not keep cats/householder who keeps cats does not smoke Blends
(The man who smokes Blends lives next to the one who keeps Cats.)


The Householder who keeps horses does not smoke Dunhill Cigs/the householder who smokes Dunhill does not keep horses
(The man who keeps Horses lives next to the man who smokes Dunhill.)




The Norwegian does not live in the blue house/the owner of blue house is not norwegian
(The Norwegian lives next to the Blue house.)


The man who smokes blends does not drink water/the person who drinks water does not smoke blends
(The man who smokes Blends has a neighbour who drinks Water.)


In [None]:
#Pattern B filter takes in information of the form:
#"The white house is not house 1"
#And deletes rows where Colour == White, House == 1

def reduce_using_pattern_B(df, variable_1_name, value_1_name, variable_2_name, value_2_name):

  og_df = df.copy(deep = True)

  condition_1 = og_df[variable_1_name] == value_1_name
  condition_2 = og_df[variable_2_name] == value_2_name

  condition_combined = np.logical_and(condition_1, condition_2)

  rows_to_remove = og_df[condition_combined]

  rows_to_keep = og_df[og_df.Id.isin(rows_to_remove.Id)== False]


  return rows_to_keep


In [None]:
filter_1 = reduce_using_pattern_B(df_after_phase_1, "Colour", "green", "House", 5)

filter_2 = reduce_using_pattern_B(filter_1, "Colour", "white", "House", 1)

filter_3 = reduce_using_pattern_B(filter_2, "Cig", "Blends", "Pet", "cat")

filter_4 = reduce_using_pattern_B(filter_3, "Pet", "horse", "Cig", "Dunhill")

filter_5 = reduce_using_pattern_B(filter_4, "Colour", "blue", "Nationality", "Norwegian")

df_after_phase_2 = reduce_using_pattern_B(filter_5, "Cig", "Blends", "Drink", "water")



df_after_phase_2



Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
1393,1,green,coffee,Blends,horse,Norwegian,1393
1398,1,green,coffee,Blends,fish,Norwegian,1398
1453,1,green,coffee,Pall Mall,bird,Norwegian,1453
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3068,1,yellow,water,Durnhill,horse,Norwegian,3068
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
3789,2,blue,beer,Blue Master,dog,Swedish,3789
4141,2,blue,tea,Blends,horse,Danish,4141
4146,2,blue,tea,Blends,fish,Danish,4146
4201,2,blue,tea,Pall Mall,bird,Danish,4201


In [None]:
len(df_after_phase_2)

55

# Third Phase of filtering rows

In this section I will use the remaining insights, along with at-a-glance observations of the remaining rows, to identify where there are any other combinations that must/must not be the case

In [None]:
#my first observation is that
#"The Green house is exactly to the left of the White house"

#means that House 1 can be neither green, nor white
#If House 1 where green, house 2 would have to be white - but house 2 is blue
#If House 1 where white, green could not be to its left



In [None]:
df_after_phase_2[df_after_phase_2.House == 1].Colour.unique()

array(['green', 'yellow'], dtype=object)

In [None]:
#Only other option is that house 1 is yellow

#can use pattern A function to filter: house 1 is yellow (all rows that violate this are removed)

In [None]:
df_after_phase_3_A = reduce_using_pattern_A(df_after_phase_2, "House", 1, "Colour", "yellow")

In [None]:
len(df_after_phase_3_A )

43

In [None]:
df_after_phase_3_A

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3068,1,yellow,water,Durnhill,horse,Norwegian,3068
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
3789,2,blue,beer,Blue Master,dog,Swedish,3789
4141,2,blue,tea,Blends,horse,Danish,4141
4146,2,blue,tea,Blends,fish,Danish,4146
4201,2,blue,tea,Pall Mall,bird,Danish,4201
4357,2,blue,water,Prince,cat,German,4357
4367,2,blue,water,Prince,horse,German,4367
4372,2,blue,water,Prince,fish,German,4372


In [None]:
#"The man who keeps Horses lives next to the man who smokes Dunhill."

#We now know that the man who smokes Durnhill (who lives in yellow house - from initial insights)
#also lives in house 1
#meaning we can deduce

#"The man who keeps horses lives in House 2"

#We can hence use pattern A filter to remove rows that violate this


df_after_phase_3_B = reduce_using_pattern_A(df_after_phase_3_A, "House", 2, "Pet", "horse")

In [None]:
len(df_after_phase_3_B)

28

In [None]:
df_after_phase_3_B

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
4367,2,blue,water,Prince,horse,German,4367
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
8389,3,white,milk,Blends,dog,Swedish,8389
8482,3,white,milk,Prince,cat,German,8482
8497,3,white,milk,Prince,fish,German,8497
9405,4,red,beer,Blue Master,cat,British,9405


In [None]:
#There are still rows where house 3 is white, but

#"The Green house is exactly to the left of the White house"

#means that if house 3 where white, house 2 must be green - which is not true
#therefore house 3 is not white

#can use pattern B to filter these rows based on statement
#House 3 is not white

df_after_phase_3_C  = reduce_using_pattern_B(df_after_phase_3_B, "House", 3, "Colour", "white")



In [None]:
len(df_after_phase_3_C)

25

In [None]:
df_after_phase_3_C

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
4367,2,blue,water,Prince,horse,German,4367
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
9405,4,red,beer,Blue Master,cat,British,9405
9420,4,red,beer,Blue Master,fish,British,9420
9950,4,red,water,Pall Mall,bird,British,9950
10764,4,green,coffee,Blends,dog,Swedish,10764


In [None]:
#at a glance I can see that the only possible colour for house 3 is red
#therefore we can use pattern A, and the statemet house 3 is red
#to filter out examples where houses other than house 3 are red

df_after_phase_3_D = reduce_using_pattern_A(df_after_phase_3_C, "House", 3, "Colour", "red")

In [None]:
len(df_after_phase_3_D)

19

In [None]:
df_after_phase_3_D

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
4367,2,blue,water,Prince,horse,German,4367
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
10764,4,green,coffee,Blends,dog,Swedish,10764
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
11289,4,white,beer,Blue Master,dog,Swedish,11289


In [None]:
#We can now see green house must be 4, the white house must be 5, in order to fulfill:
#"The Green house is exactly to the left of the White house"

#Given that Greens and Whites are the only colour options for Houses 4 and 5
#If I use pattern A to remove any Whites not at 5 and any colours at 5 that aren't white,
#I will be left with only greens at 4, and only whites at 5, which is what I want
#I don't need to use both "House 4 is green" and "House 5 is white" as they imply each other



In [None]:
df_after_phase_3_E = reduce_using_pattern_A(df_after_phase_3_D, "House", 4, "Colour", "green")

In [None]:
len(df_after_phase_3_E)

14

In [None]:
df_after_phase_3_E

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
4367,2,blue,water,Prince,horse,German,4367
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
10764,4,green,coffee,Blends,dog,Swedish,10764
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#At a glance, house 1's drink must be water as this is the only possible option

df_after_phase_3_F = reduce_using_pattern_A(df_after_phase_3_E, "House", 1, "Drink", "water")

In [None]:
len(df_after_phase_3_F)

11

In [None]:
df_after_phase_3_F

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
10764,4,green,coffee,Blends,dog,Swedish,10764
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414
14771,5,white,tea,Blends,fish,Danish,14771


In [None]:
#house 2 must be the house that drinks tea

df_after_phase_3_G = reduce_using_pattern_A(df_after_phase_3_F, "House", 2, "Drink", "tea")

In [None]:
len(df_after_phase_3_G)

9

In [None]:
df_after_phase_3_G

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575
10764,4,green,coffee,Blends,dog,Swedish,10764
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#House 2 must smoke blends

df_after_phase_3_H = reduce_using_pattern_A(df_after_phase_3_G, "House", 2, "Cig", "Blends")

In [None]:
len(df_after_phase_3_H)

7

In [None]:
df_after_phase_3_H

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
6575,3,red,milk,Pall Mall,bird,British,6575
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#review final two insights:

#A) The man who smokes Blends has a neighbour who drinks Water.
#B) The man who smokes Blends lives next to the one who keeps Cats.

#Blends smoker is in House 2
#House 1 drinks water, therefore first insight fulfilled already

#For insight B,
#This means that either household 1 or 3 keeps a cat
#Household 3 keeps birds, so household 1 must keep a cat




reduce_using_pattern_A(df_after_phase_3_H, "House", 1, "Pet", "cat")

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
4141,2,blue,tea,Blends,horse,Danish,4141
6575,3,red,milk,Pall Mall,bird,British,6575
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#The final answer.
#Have checked my results on a logic problem website, and they are correct.

#see results here - https://gohighbrow.com/einstein-riddle-solution/#:~:text=The%20owner%20who%20smokes%20Blend,%233%2C%20keeps%20birds).&text=And%20the%20linchpin%3A&text=The%20Swede%20keeps%20dogs%20as%20pets%2C%20which%20means%20%E2%80%A6%20the%20German,And%20there%20you%20have%20it.

# Reflections

In [None]:
#My main reflection on areas for improvement is that for Phase 3, I often relied
#on observations to find new information about what patterns
#must/must not be the case

#for instance, I was able to observe that all rows containing House 3,also contained the colour red
#from this I deduced that the colour red could not appear in the same row as any house but
#house 3
#Given that I also could see that there where rows where the house colour was red, but house number was not three
#I knew these rows had to be filtered out


#Given that there were multiple occasions where I followed the same pattern of logic
#to remove rows, I could have created a formula that completed the operation of:
#- identifying combinations that had to go together
# - using this insight to remove rows that could thus not be part of the solution

#In the case of this problem, it didn't seem worth the time creating such a function as
#I could spot such patterns at a glance, and it is worthwhile getting your teeth into
#the particulars of the data where it is manageable

#that said, for similar problems but with more variables or observations,
#it may be worth having a more automated approach

#I will hence develop these functions as a proof of concept

In [None]:
#I will start with a function that outputs a dataframe containing the variables/values for
#all cooccuring pairs


def find_couccurent_values (df):

  df = df.copy(deep = True)

  #remove Id column as all values are unique
  if "Id" in df.columns:
    df.drop(["Id"], axis = 1, inplace = True)

  #Create a list of all columns in df
  cols = list(df.columns)

  #create a 3-layer nested loop that
  #for each primary variable (e.g. House)
  #makes a dataframe for each unique value of that primary variable
  #(e.g. House = 1, House = 2, etc)
  #Loops through each column of the filtered df to check if there are any
  #columns with unique values in
  #If there are records it in a dataframe

  Couccuring_pairs_storage = pd.DataFrame({'var1':[], 'value1':[],
                                           'var2':[], 'value2':[]})

  for i in range(0, len(cols)):

    col = cols[i]
    primary_column = df[col]

    primary_unique_values = primary_column.unique()

    for j in range(0, len(primary_unique_values)):

      primary_column_value = primary_unique_values[j]

      filtered_df = df[primary_column == primary_column_value ]

      for k in range(0, len(cols)):

        if i == k:

          next

        else:

          col_2 = cols[k]
          column_to_count_unique_values = filtered_df[col_2]

          secondary_unqiue_values = column_to_count_unique_values.unique()

          if len(secondary_unqiue_values) == 1:

            row =  pd.DataFrame({'var1':[col], 'value1':[primary_column_value],
                               'var2':[col_2], 'value2':[secondary_unqiue_values[0]]})

            Couccuring_pairs_storage = pd.concat([Couccuring_pairs_storage, row], axis = 0)



  return Couccuring_pairs_storage

In [None]:
find_couccurent_values(df_after_phase_1)

Unnamed: 0,var1,value1,var2,value2
0,House,1.0,Nationality,Norwegian
0,House,2.0,Colour,blue
0,House,3.0,Drink,milk
0,Colour,green,Drink,coffee
0,Colour,yellow,Cig,Durnhill
0,Colour,blue,House,2
0,Colour,red,Nationality,British
0,Drink,coffee,Colour,green
0,Drink,beer,Cig,Blue Master
0,Drink,tea,Nationality,Danish


In [None]:
#testing the function on df_after_phase_1,
#I can see that the output essentially communicates to me the insights I used to filter
#out rows as part of phase 1 (insights of the form The Brit Lives in the Red House)

#Noteworthy is that these insights would not be useful to use for further filters,
#as the represent filters that I have already made

#This is demonstrated by the fact that var/val pairs that cooccur exclusively
#appear twice in the above table.
#e.g.


In [None]:
find_couccurent_values(df_after_phase_3_D)

Unnamed: 0,var1,value1,var2,value2
0,House,1.0,Colour,yellow
0,House,1.0,Drink,water
0,House,1.0,Cig,Durnhill
0,House,1.0,Nationality,Norwegian
0,House,2.0,Colour,blue
0,House,2.0,Pet,horse
0,House,3.0,Colour,red
0,House,3.0,Drink,milk
0,House,3.0,Nationality,British
0,House,5.0,Colour,white


In [None]:
len(find_couccurent_values(df_after_phase_3_D))

55

In [None]:
#Using this formula on df_after_phase_3_D, I can see that there are lots of occasions where there is a symmetry in how var/val pairs co-occur.
#for instance, I can see from the first line that where House is 1, Colour must be yellow. Lower down I can also see that Where colour is yellow, house must be 1

#This isn't the case by necessity as it is possible for there to only be cases where House 1 has colour yellow, but cases where other houses are also yellow.

#Noteably, there must be a case following similar logic to the above, as there are an odd number of rows. Meaning there is at least one row that does not have
#a symmetrical counterpat.

#Note, I am only interested in the rows without a symmetrical counterpart (e.g. all House 3 must be red, but red appears for multiple house numbers)
#Because these are the cases that allow me to filter out rows. Those with a symmetrical counterpart are there because they have already been filtered.

#So what I want to do is create a function that leaves me only with rows that are useful for filtering

In [None]:
def find_assymetrical_cases(df):

  assymetrical_cases = []

  for i in range(0, len(df)):

    row = df.iloc[i]

    condition_1 = row['var1'] == df['var2']
    condition_2 = row['value1'] == df['value2']
    condition_3 = row['var2'] == df['var1']
    condition_4 = row['value2'] == df['value1']

    c1_2 = np.logical_and(condition_1, condition_2)
    c3_4 = np.logical_and(condition_3, condition_4)

    conditions_combined = np.logical_and (c1_2, c3_4)

    df_filtered_for_symmetrical_cases = df[conditions_combined]

    if len(df_filtered_for_symmetrical_cases) == 0:

      assymetrical_cases.append(i)

  df_asym_cases = df.iloc[assymetrical_cases]

  return df_asym_cases



In [None]:
find_assymetrical_cases(find_couccurent_values(df_after_phase_3_D))

Unnamed: 0,var1,value1,var2,value2
0,House,1.0,Drink,water
0,House,5.0,Colour,white
0,Colour,yellow,Drink,water
0,Colour,green,House,4
0,Drink,coffee,House,4
0,Drink,beer,Colour,white
0,Drink,beer,Pet,dog
0,Drink,beer,Nationality,Swedish
0,Cig,Durnhill,Drink,water
0,Cig,Blue Master,Colour,white


In [None]:
#Now I want a higher level function that:

#takes in a dataframe of possible rows that need filtering

#finds to co-occurent values

#finds the assymetrical cases

#uses the assymetrical cases to filter the originally inputted dataframe, using the pattern A filter


def filter_using_assymetrical_co_occurance (df):

  all_cooccuring_var_vals = find_couccurent_values(df)

  assym_cooccuring_var_vals = find_assymetrical_cases(all_cooccuring_var_vals)


  #use each row of resultant df to do a pattern A filter

  filtered_df = df

  for i in range(0, len(assym_cooccuring_var_vals)):

    filtered_df = reduce_using_pattern_A(filtered_df, assym_cooccuring_var_vals.iloc[0,0],
                                         assym_cooccuring_var_vals.iloc[0,1],
                                         assym_cooccuring_var_vals.iloc[0,2],
                                         assym_cooccuring_var_vals.iloc[0,3])

  return filtered_df

In [None]:
#Will use the function above to filter df_after_phase_2

In [None]:
len(df_after_phase_2)

55

In [None]:
filtered_A = filter_using_assymetrical_co_occurance(df_after_phase_2)
len(filtered_A)

55

In [None]:
#no rows have been removed, let's look at why

a = find_couccurent_values(df_after_phase_2)
b = find_assymetrical_cases(a)
b

Unnamed: 0,var1,value1,var2,value2


In [None]:
#appears that without putting the leaps of logic I made at the beginning of phase 3 there aren't any assymetrical cases to filter by

#want to see if function above works for df_after_phase_3_A

In [None]:
filtered_B = filter_using_assymetrical_co_occurance(df_after_phase_3_A)
len(filtered_B)

32

In [None]:
#Filtering df_after_phase_3_A with the new filter does reduce number of rows, meaning that the filter is able to identify
#values that must be attached to certain variables, thus allowing for other rows that violate this rule to be removed.

In [None]:
#now see if more assymetrical cases have appeared, by filtering filtered_B

In [None]:
filtered_C = filter_using_assymetrical_co_occurance(filtered_B)
len(filtered_C)

25

In [None]:
#Using new filter in succession can, if conditions are right, give sequential reductions in number of rows, even without inputting further insights

In [None]:
filtered_D = filter_using_assymetrical_co_occurance(filtered_C)
len(filtered_D)

25

In [None]:
#although it appears it can exhaust possible rows to filter out without additional insights, as shown by
#there being no change between filtered_C and filtered_D

In [None]:
filtered_D

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3068,1,yellow,water,Durnhill,horse,Norwegian,3068
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
3789,2,blue,beer,Blue Master,dog,Swedish,3789
4141,2,blue,tea,Blends,horse,Danish,4141
4146,2,blue,tea,Blends,fish,Danish,4146
4201,2,blue,tea,Pall Mall,bird,Danish,4201
6515,3,red,milk,Blends,horse,British,6515
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575


In [None]:
#no more insights to remove using assymetrical cases

#will need to use insights such as:
#the green house is to the left of white house (removes all cases where House = 3, colour = white)
#removing these rows should create more assymetrical cases

In [None]:
rows_to_remove_from_FD = filtered_D[np.logical_and(filtered_D.House == 3, filtered_D.Colour == "white")]
rows_to_remove_from_FD

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
8389,3,white,milk,Blends,dog,Swedish,8389
8482,3,white,milk,Prince,cat,German,8482
8492,3,white,milk,Prince,horse,German,8492
8497,3,white,milk,Prince,fish,German,8497


In [None]:
filtered_E = filtered_D[filtered_D.Id.isin(rows_to_remove_from_FD.Id)== False]
filtered_E

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3068,1,yellow,water,Durnhill,horse,Norwegian,3068
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
3789,2,blue,beer,Blue Master,dog,Swedish,3789
4141,2,blue,tea,Blends,horse,Danish,4141
4146,2,blue,tea,Blends,fish,Danish,4146
4201,2,blue,tea,Pall Mall,bird,Danish,4201
6515,3,red,milk,Blends,horse,British,6515
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575


In [None]:
len(filtered_E)

21

In [None]:
#now I want to see if I can get further reductions of rows using the new filter

filtered_F = filter_using_assymetrical_co_occurance(filtered_E)
len(filtered_F)

18

In [None]:
#success, will try using new filter again

In [None]:
filtered_G = filter_using_assymetrical_co_occurance(filtered_F)
len(filtered_G)

17

In [None]:
#another reduction - though only by one row

In [None]:
filtered_H = filter_using_assymetrical_co_occurance(filtered_G)
len(filtered_H)

17

In [None]:
#no reductions this time, will need new insights

In [None]:
filtered_H

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3068,1,yellow,water,Durnhill,horse,Norwegian,3068
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
3789,2,blue,beer,Blue Master,dog,Swedish,3789
4141,2,blue,tea,Blends,horse,Danish,4141
4146,2,blue,tea,Blends,fish,Danish,4146
4201,2,blue,tea,Pall Mall,bird,Danish,4201
6515,3,red,milk,Blends,horse,British,6515
6520,3,red,milk,Blends,fish,British,6520
6575,3,red,milk,Pall Mall,bird,British,6575


In [None]:
#Can use the man who keeps horses lives next to the man who smokes durnhill
#House 2 should be horses

In [None]:
filtered_i = reduce_using_pattern_A(filtered_H, "House", 2, "Pet", 'horse')
len(filtered_i)

10

In [None]:
#now will use new filter again

In [None]:
filtered_j =filter_using_assymetrical_co_occurance(filtered_i)
len(filtered_j)

8

In [None]:
#and again

In [None]:
filtered_k =filter_using_assymetrical_co_occurance(filtered_j)
len(filtered_k)

7

In [None]:
#another reduction, will use again

In [None]:
filtered_L =filter_using_assymetrical_co_occurance(filtered_k)
len(filtered_L)

7

In [None]:
#no more reductions

In [None]:
filtered_L

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
3073,1,yellow,water,Durnhill,fish,Norwegian,3073
4141,2,blue,tea,Blends,horse,Danish,4141
6575,3,red,milk,Pall Mall,bird,British,6575
10857,4,green,coffee,Prince,cat,German,10857
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#insight that the man who smokes blended lives next to man who keeps cats dictats that House 1 is the household with the cat

filtered_M = reduce_using_pattern_A(filtered_L,"House", 1, "Pet", "cat")
filtered_M

Unnamed: 0,House,Colour,Drink,Cig,Pet,Nationality,Id
3058,1,yellow,water,Durnhill,cat,Norwegian,3058
4141,2,blue,tea,Blends,horse,Danish,4141
6575,3,red,milk,Pall Mall,bird,British,6575
10872,4,green,coffee,Prince,fish,German,10872
14414,5,white,beer,Blue Master,dog,Swedish,14414


In [None]:
#arrives at the right answer

In [None]:
#overall - developing the filter that uses assymetrical co-occurance to remove rows has proved of some use
#It does allow for the removal of impossible rows without the need for additional insights
#However, in this situation it will hit a point where no more rows can be removed through this method
#necessitating the need for additional insights anyway
#Thus making this a useful tool, but one that does not negate the need for getting one's hands dirty in the data

#Also, I have found building such a function a valuable exercise for getting to grips with logic problems