# Rubik's Cubes: An Analysis of Scrambles
Jaraad Kamal

## Background and Definitions
The Rubik's Cube was a puzzle game created by Ernő Rubik in 1974. Throughout the years it has gained immense popularity as more and more people learn to solve it and compete. 
I have been solving Rubik's Cubes for over half my life. I am by no means a competition level solver but I have always been interested in the highest levels of speed cubing.
> **Speed Cubing**
> <br>
> Competitively solving Rubik's Cubes as fast as possible.

In this tutorial we will discuss the scrambles of a **3x3 Rubik's Cube**. 
> **Scramble**
> <br>
> A random set of moves used to get a Rubik's Cube or puzzle into a random unsolved state.
> <br>
> The moves needed to "mix up" a Rubk's Cube.
### Goal
Competitions have been going on since the 1980s. For this tutorial we will be finding out if there were specific scrambles (or initial shuffles of the cube) that are harder to solve than others. Futher we can develop a model to determine if a particular scramble is more difficult than others.


### Notation
Before getting into the data and the code we must first develop an understanding for how notation works in Rubik's Cubes.
<br>
The notation is used as a way to describe which moves are being performed.
<br><br>
There are 4 types of moves for a 3x3 Rubik's Cube:
- Whole cube rotations
- Face Turns
- Wide Moves
- Slice Moves

> **Visual Depictions**
> <br>https://jperm.net/3x3/moves

For the purposes of this tutorial only **Face Turns** will be examined as they are the only types of moves used when scrabling.
<br>
*(note: every type of move can be accomplished with only face turns)*

#### Face Turns
Each Face Turn corresponds to a particular face of the cube.
The basic moves are: 

|Name   | Notation| Variant  |
|----   | --------| ---------|
| Up    | U       | U2 or U' |
| Down  | D       | D2 or D' |
| Left  | L       | L2 or L' |
| Right | R       | R2 or R' |
| Front | F       | F2 or F' |
| Back  | B       | B2 or B' |


Each moves corresponds to one of the 6 faces of the cube.
<br>
They indicate moving a face of the cube clockwise 90 degrees (when viewing the face head on). The addition of `2` means rorate the face 180 degrees (90 degrees twice) and an apostrophe `'` (pronounced *prime*) dictates a counterclockwise rotation.  
> **Example**
><br>
> `U ` means move the top most face 90 degrees **clockwise**
><br>
> `U2` means move the top most face 180 degrees
><br>
> `U'` means move the top most face 90 degrees **counterclockwise**

A typical scramble thus looks like:
<br>
`D U2 F2 D R2 D2 L2 U' R2 B' R2 D F U2 F L2 R' D L`

Now that we understand how notations and scrambles work we can get into the actual data wrangling.

## Getting the Data
For this project I will be using the database created by the **World Cube Association** (WCA), they host the largest and most updated database for competitions throughout the world. I will be using the data up to April 14<sup>th</sup>, 2022.
> **Links**
><br>
> WCA Homepage: https://www.worldcubeassociation.org/
><br>
> WCA Database Download: https://www.worldcubeassociation.org/results/misc/export.html

For this tutorial download the **tsv zip file** and extract the contents into a subfolder of your choice.

## Getting to Know the Data
The files created by the WCA is *big* Trying to just get a text editor to open them up is a bad idea. Before blindly coding it is important that you get comfortable with the way it is formatted. The database comes with a **README** file that gives an overview. 
<br><br>
Briefly, the tsv files are a collection of tables each with their own information. According to the **README** the database itself consists of the following tables:

| Table                                   | Contents                                           |
| --------------------------------------- | -------------------------------------------------- |
| Persons                                 | WCA competitors                                    |
| Competitions                            | WCA competitions                                   |
| Events                                  | WCA events (3x3x3 Cube, Megaminx, etc)             |
| Results                                 | WCA results per competition+event+round+person     |
| RanksSingle                             | Best single result per competitor+event and ranks  |
| RanksAverage                            | Best average result per competitor+event and ranks |
| RoundTypes                              | The round types (first, final, etc)                |
| Formats                                 | The round formats (best of 3, average of 5, etc)   |
| Countries                               | Countries                                          |
| Continents                              | Continents                                         |
| Scrambles                               | Scrambles                                          |
| championships                           | Championship competitions                          |
| eligible_country_iso2s_for_championship | See explanation below                              |


>**Note**
><br>
>For this tutorial we are examining if there are specific scrambles that are harder than others. 
>To do this we will only look at the **Scrambles** and the **Results** tables. 
>The remaining files will not be any use for us.

## The Code
For this tutorial I will be working in **Python 3**.
<br>
First lets import some libraries that will be usefull later on.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

### Loading the Data
The data we need are stored in two files called **WCA_export_Results.tsv** and **WCA_export_Scrambles.tsv**.
To work with this data we will load their information into two variables called `results_frame` and `scrambles_frame`.

In [3]:
results_frame = pd.read_csv("extracted_tsv/WCA_export_Results.tsv", sep = '\t')
scrambles_frame = pd.read_csv("extracted_tsv/WCA_export_Scrambles.tsv", sep = '\t')

Now we can take a small look at the data.

In [7]:
display(results_frame.head())
display(scrambles_frame.head())
print("size of results is: ", len(results_frame))
print("size of scrambles is: ", len(scrambles_frame))

Unnamed: 0,competitionId,eventId,roundTypeId,pos,best,average,personName,personId,personCountryId,formatId,value1,value2,value3,value4,value5,regionalSingleRecord,regionalAverageRecord
0,LyonOpen2007,333,1,15,1968,2128,Etienne Amany,2007AMAN01,Cote d_Ivoire,a,1968,2203,2138,2139,2108,AfR,AfR
1,LyonOpen2007,333,1,16,1731,2140,Thomas Rouault,2004ROUA01,France,a,2222,2153,1731,2334,2046,,
2,LyonOpen2007,333,1,17,2305,2637,Antoine Simon-Chautemps,2005SIMO01,France,a,3430,2581,2540,2789,2305,,
3,LyonOpen2007,333,1,18,2452,2637,Irène Mallordy,2007MALL01,France,a,2715,2452,2868,2632,2564,,
4,LyonOpen2007,333,1,19,2677,2906,Marlène Desmaisons,2007DESM01,France,a,2921,3184,2891,2677,2907,,


Unnamed: 0,scrambleId,competitionId,eventId,roundTypeId,groupId,isExtra,scrambleNum,scramble
0,1,GaleriesDorianOpen2014,pyram,1,A,0,1,U R' L' B U B' R' B' L' U L' u' r' b'
1,2,GaleriesDorianOpen2014,pyram,1,A,0,2,B' L' U' B U' L U' R B' R' L' u r
2,3,GaleriesDorianOpen2014,pyram,1,A,0,3,R' U R' L' B' U L' B' R L' U l'
3,4,GaleriesDorianOpen2014,pyram,1,A,0,4,L R' L U B R L R B R L' u' l
4,5,GaleriesDorianOpen2014,pyram,1,A,0,5,B' U R L' R B L' U' B' R' U' l' r'


size of results is:  2789385
size of scrambles is:  1280056


#### Observations
As you can see there is **a lot** of data, and a lot of the data is information that we do not need.
<br>
>**Example**
<br>
>The tables contain information about events related to *pyramix* and other similar puzzle competitions. This can be seen in the `eventId` column. 

We are looking for **3x3x3** Rubik's Cube events. Lets take a look at the types of events in both tables to determine what this event is named.

The types of puzzles in the results table.

In [9]:
display(results_frame['eventId'].unique())

array(['333', '333oh', '444', '555', '333bf', '333mbo', 'minx', '333ft',
       'mmagic', 'clock', '333fm', '222', 'magic', 'sq1', 'pyram',
       '444bf', '555bf', '666', '777', '333mbf', 'skewb'], dtype=object)

Types of puzzles in the scrambles table.

In [10]:
display(scrambles_frame['eventId'].unique())

array(['pyram', '333bf', '333', '333oh', '444', '222', '666', 'minx',
       'skewb', '777', '555', '333fm', 'sq1', '333ft', '444bf', 'clock',
       '555bf', '333mbf'], dtype=object)

#### Information on the abbreviations
There are numerous abbreviations for the different events. We are only concerned about the `333` event.
> **All 3x3x3 Events**
> * `333` - Standard 3x3x3 event
> * `333bf` - 3x3x3 Blindfolded
> * `333oh` - 3x3x3 One Handed
> * `333fm` - 3x3x3 Fewest Moves
> * `333ft` - 3x3x3 With Feet
> * `333mbf` - 3x3x3 Multi-Blindfolded

#### Trimming the Data
Now that we know what we are looking for we can remove any unnecessary information in our tables.

In [11]:
results_frame = results_frame[results_frame['eventId'] == '333']
scrambles_frame = scrambles_frame[scrambles_frame['eventId'] == '333']

### Determining How the Tables are Related
At this point we have two tables. One for the scrambles and one for individual results. The challenge now is to determine which results correspond to which scrambles.
Because documentation for the data is scarce the best way to develop an understanding is by poking around and viewing the data. 

#### Poking around
Here is some code that will help you view the data and come to some conclusions about how they are related
```python
# Looking at only the scrambles and results from GaleriesDorianOpen2014 event (a small case study)

# getting the scrambles from the event
scrambles_temp = scrambles_frame[scrambles_frame['competitionId'] == 'GaleriesDorianOpen2014']
#getting results
results_temp = results_frame[results_frame['competitionId'] == 'GaleriesDorianOpen2014']

# vewing scrambles and results
display(scrambles_temp)
display(results_temp)

#optional: veiwing the scramble pattern for a particular scramble in the event
print(scrambles_temp.loc[72,'scramble'])
```
The results from the code are omitted for the sake of brevity.

#### Obersvations and Findings


In [15]:
len(scrambles_frame['competitionId'].unique())

5355

In [29]:
scrambles_temp = scrambles_frame[scrambles_frame['competitionId'] == 'GaleriesDorianOpen2014']
scrambles_temp.head(50)

Unnamed: 0,scrambleId,competitionId,eventId,roundTypeId,groupId,isExtra,scrambleNum,scramble
31,32,GaleriesDorianOpen2014,333,1,A,0,1,B2 R2 F2 L' U2 R D2 R D' U2 F2 L' R' D L' F L D L
32,33,GaleriesDorianOpen2014,333,1,A,0,2,R U' D' R2 D' R D B2 L B2 D' F B2 D2 R2 F' D2 ...
33,34,GaleriesDorianOpen2014,333,1,A,0,3,U R' U F2 R U F B2 U R' U2 F' L2 B2 R2 U2 B U2...
34,35,GaleriesDorianOpen2014,333,1,A,0,4,B D2 F' R2 D2 B U2 B' L' R' D U' L' B R F' R' ...
35,36,GaleriesDorianOpen2014,333,1,A,0,5,D2 L2 R F2 D2 F2 R' U2 L B' U F R2 B' D' F2 R ...
36,37,GaleriesDorianOpen2014,333,1,A,1,1,F2 L' F2 R2 B2 F2 R D2 L' U2 F U2 R D U L D' B...
37,38,GaleriesDorianOpen2014,333,1,A,1,2,B2 D' L2 U' R2 B2 D' B2 F2 L' F D L2 F R' D L'...
38,39,GaleriesDorianOpen2014,333,1,B,0,1,F2 D2 B U2 R2 D2 R2 U2 F L' F2 D' B' R' D' U R...
39,40,GaleriesDorianOpen2014,333,1,B,0,2,R2 B2 D' F2 U' L2 F2 D2 L' B2 R U' B' F2 L F2 ...
40,41,GaleriesDorianOpen2014,333,1,B,0,3,F' B' L' F2 L' U2 L B' L' U' B' D2 R B2 R2 L' ...


In [26]:
scrambles_temp.loc[72,'scramble']

"D F2 U2 R2 U F2 U' F2 U' R' D' U2 B U2 R' F2 D' F2 U F"

In [20]:
results_temp = results_frame[results_frame['competitionId'] == 'GaleriesDorianOpen2014']
results_temp

Unnamed: 0,competitionId,eventId,roundTypeId,pos,best,average,personName,personId,personCountryId,formatId,value1,value2,value3,value4,value5,regionalSingleRecord,regionalAverageRecord
449810,GaleriesDorianOpen2014,333,1,1,811,1034,Jules Desjardin,2010DESJ01,France,a,1115,972,811,1016,1993,,
449811,GaleriesDorianOpen2014,333,1,2,1044,1134,Valentin Hoffmann,2011HOFF02,France,a,1113,1130,1044,1159,1580,,
449812,GaleriesDorianOpen2014,333,1,3,897,1314,Julien Rochette,2009ROCH01,France,a,1218,1300,897,-1,1425,,
449813,GaleriesDorianOpen2014,333,1,4,1163,1324,Thomas Pouget,2011POUG01,France,a,1163,1431,1346,1468,1194,,
449814,GaleriesDorianOpen2014,333,1,5,1343,1412,Mario Laurent,2008LAUR01,France,a,1488,1544,1343,1369,1380,,
449815,GaleriesDorianOpen2014,333,1,6,1318,1426,Philippe Virouleau,2008VIRO01,France,a,1355,1396,1528,1616,1318,,
449816,GaleriesDorianOpen2014,333,1,7,1302,1429,Alexandre Richard,2013RICH01,France,a,1302,1394,1686,1375,1518,,
449817,GaleriesDorianOpen2014,333,1,8,1191,1472,Alban Reynaud,2011REYN02,France,a,1191,1388,1661,1597,1430,,
449818,GaleriesDorianOpen2014,333,1,9,1313,1524,Pierre Raynal,2008RAYN01,France,a,1453,1583,1581,1313,1538,,
449819,GaleriesDorianOpen2014,333,1,10,1381,1569,Kevin Guillaumond,2009GUIL01,France,a,1591,1586,1543,1577,1381,,
