<a href="https://colab.research.google.com/github/dcshapiro/AI-Feynman/blob/master/AI_Feynman_cleared_output.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AI Feynman 2.0: Learning Regression Equations From Data

### Clone repository and install dependencies

In [None]:
!git clone https://github.com/SJ001/AI-Feynman.git

Look at what we downloaded

In [None]:
!ls /content/AI-Feynman
# %pycat AI-Feynman/requirements.txt if you need to fix the dependencies

Fix broken requirements file (may not be needed if later versions fix this).

In [None]:
%%writefile AI-Feynman/requirements.txt
torch>=1.4.0
matplotlib
sympy==1.4
pandas
scipy
sortedcontainers

Install dependencies not already installed in Google Collab

In [None]:
!pip install -r AI-Feynman/requirements.txt

Check that fortran is installed

In [None]:
!gfortran --version

Check the OS version

In [None]:
!lsb_release -a

Install the csh shell

In [None]:
!sudo apt-get install csh

Set loose permissions to avoid some reported file permissions issues

In [None]:
!chmod +777 /content/AI-Feynman/Code/*

### Compile the fortran code

Look at the code directory

In [None]:
!ls -l /content/AI-Feynman/Code

Compile .f files into .x files

In [None]:
!cd /content/AI-Feynman/Code/ && ./compile.sh

### Run the first example from the AI-Feynman repository

Change working directory to the Code directory

In [None]:
import os
os.chdir("/content/AI-Feynman/Code/")
print(os.getcwd())

In [None]:
!pwd

In [None]:
%%writefile ai_feynman_magic.py
from S_run_aifeynman import run_aifeynman
# Run example 1 as the regression dataset
run_aifeynman("/content/AI-Feynman/example_data/","example1.txt",30,"14ops.txt", polyfit_deg=3, NN_epochs=400)

Look at the first line of the example 1 file

In [None]:
!head -n 1 /content/AI-Feynman/example_data/example1.txt

In [None]:
# Example 1 has data generated from an equation, where the last column is the regression target, and the rest of the columns are the input data
# The following example shows the relationship between the first line of the file example1.txt and the formula used to make the data
x=[1.6821347439986711,1.1786188905177983,4.749225735259924,1.3238356535004034,3.462199507094163]
x0,x1,x2,x3=x[0],x[1],x[2],x[3]
(x0**2 - 2*x0*x1 + x1**2 + x2**2 - 2*x2*x3 + x3**2)**0.5

Run the code. It takes a long time, so go get some coffee.

In [None]:
!cd /content/AI-Feynman/Code/ && python3 ai_feynman_magic.py

### Assess the results

In [None]:
!cat results.dat 

We found a candidate with an excellent fit, let's see what we got

In [None]:
!ls -l /content/AI-Feynman/Code/results/

In [None]:
!ls -l /content/AI-Feynman/Code/results/NN_trained_models/models

In [None]:
!cat /content/AI-Feynman/Code/results/solution_example1.txt

Note in the cell above that the solution with the lowest loss is the formula this data was generated from

### Try our own dataset generation and equation learning

Until now we were not storing the results in Google Drive. We might want to keep the data in Drive so that the results don't disappear when this Collab instance gets nice and dead.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Make a directory in the mounted Google Drive where we will do our work

In [None]:
!mkdir -p /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman

Copy over the stuff we did so far, and from now on we work out of Google Drive

In [None]:
!cp -r /content/AI-Feynman /content/gdrive/My\ Drive/Lemay.ai_research/

The code below generates our regression example dataset

We generate points for 4 columns, where x0 is from the same equation as x1, and x2 is from the same equation as x3
The last column is Y

In [None]:
import os
import random

os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data")

def getY(x01,x23):
  y = -0.5*x01+0.5*x23+3
  return y

def getRow():
  [x0,x2]=[random.random() for x in range(2)]
  x1=x0
  x3=x2
  y=getY(x1,x3)
  return str(x0)+" "+str(x1)+" "+str(x2)+" "+str(x3)+" "+str(y)+"\n"

with open("duplicateVarsExample.txt", "w") as f:
  for _ in range(10000):
    f.write(getRow())
f.close()

# switch back to the code directory
os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/Code")

Let's look at our data

In [None]:
!head -n 20 ../example_data/duplicateVarsExample.txt

Let's also plot the data for x01 and x23 against Y

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('seaborn-whitegrid')
import numpy as np

df=pd.read_csv("../example_data/duplicateVarsExample.txt",sep=" ",header=None)
df.plot.scatter(x=0, y=4)
df.plot.scatter(x=2, y=4)

In [None]:
!pwd

Let's write out the runner file for this experiment

In [None]:
%%writefile ai_feynman_duplicate_variables.py
from S_run_aifeynman import run_aifeynman
run_aifeynman("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data/","duplicateVarsExample.txt",30,"14ops.txt", polyfit_deg=3, NN_epochs=400)

Don't forget to lower the file permissions

In [None]:
!chmod 777 /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/*
!chmod +x /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/*.scr

Now we run the file, and go get more coffee, because this is not going to be fast...

In [None]:
!python3 ai_feynman_duplicate_variables.py

Initial models quickly mapped to x0 and x2 (the system realized x1 and x3 are duplicates and so not needed)

Later on the system found 3.000000000000+log(sqrt(exp((x2-x1)))) which is a bit crazy but looks like a plane

We can see on Wolfram alpha that an equivalent form of this equation is:

(x2 - x1)/2 + 3.000000000000 

which is what we used to generate the dataset!

Link: https://www.wolframalpha.com/input/?i=3.000000000000%2Blog%28sqrt%28exp%28%28x2-x1%29%29%29%29

In [None]:
!ls -l /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/results/

In [None]:
!cat /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/results/solution_duplicateVarsExample.txt

The solver settled on *log(sqrt(exp(-x1 + x3))) + 3.0* which we know is correct

Now, that was a bit of a softball problem as it has an exact solution. Let's now add noise to the dataset and see how the library holds up

### Let's add small amount of noise to every variabe and see the fit quality

We do the same thing as before, but now we add or subtract noise to x0,x1,x2,x3 after generating y

In [None]:
import os
import random

os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data")

def getY(x01,x23):
  y = -0.5*x01+0.5*x23+3
  return y

def getRow():
  x=[random.random() for x in range(4)]
  x[1]=x[0]
  x[3]=x[2]
  y=getY(x[1],x[3])
  mu=0
  sigma=0.05
  noise=np.random.normal(mu, sigma, 4)
  x=x+noise
  return str(x[0])+" "+str(x[1])+" "+str(x[2])+" "+str(x[3])+" "+str(y)+"\n"

with open("duplicateVarsWithNoise100k.txt", "w") as f:
  for _ in range(100000):
    f.write(getRow())
f.close()

# switch back to the code directory
os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/Code")

Let's have a look at the data

In [None]:
!head -n 20 ../example_data/duplicateVarsWithNoise100k.txt

Now let's plot the data

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('seaborn-whitegrid')
import numpy as np

df=pd.read_csv("../example_data/duplicateVarsWithNoise100k.txt",sep=" ",header=None)
df.plot.scatter(x=0, y=4)
df.plot.scatter(x=1, y=4)
df.plot.scatter(x=2, y=4)
df.plot.scatter(x=3, y=4)

In [None]:
%%writefile ai_feynman_duplicateVarsWithNoise.py
from S_run_aifeynman import run_aifeynman
run_aifeynman("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data/","duplicateVarsWithNoise100k.txt",30,"14ops.txt", polyfit_deg=3, NN_epochs=600)

In [None]:
!chmod +777 /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/*
!chmod +777 /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/*
# switch back to the code directory
os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/Code/")

In [None]:
!pwd

In [None]:
!chmod +x /content/gdrive/My\ Drive/Lemay.ai_research/AI-Feynman/Code/*.scr
!ls -l *.scr

In [None]:
print(os.getcwd())
!sudo python3 ai_feynman_duplicateVarsWithNoise.py

In [None]:
%%writefile ai_feynman_duplicateVarsWithNoise3.py
from S_run_aifeynman import run_aifeynman
run_aifeynman("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data/","duplicateVarsWithNoise.txt",30,"19ops.txt", polyfit_deg=3, NN_epochs=1000)

In [None]:
print(os.getcwd())
!sudo python3 ai_feynman_duplicateVarsWithNoise3.py

### No duplicate columns but same noise

In [None]:
import os
import random
import numpy as np
os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data")

def getY(x01,x23):
  y = -0.5*x01+0.5*x23+3
  return y

def getRow():
  x=[0 for x in range(4)]
  x[1]=random.random()
  x[3]=random.random()
  y=getY(x[1],x[3])
  mu=0
  sigma=0.05
  noise=np.random.normal(mu, sigma, 4)
  x=x+noise
  return str(x[1])+" "+str(x[3])+" "+str(y)+"\n"

with open("varsWithNoise.txt", "w") as f:
  for _ in range(100000):
    f.write(getRow())
f.close()

# switch back to the code directory
os.chdir("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/Code")

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use('seaborn-whitegrid')
import numpy as np

df=pd.read_csv("../example_data/varsWithNoise.txt",sep=" ",header=None)
df.plot.scatter(x=0, y=2)
df.plot.scatter(x=1, y=2)

In [None]:
%%writefile ai_feynman_varsWithNoise.py
from S_run_aifeynman import run_aifeynman
run_aifeynman("/content/gdrive/My Drive/Lemay.ai_research/AI-Feynman/example_data/","varsWithNoise.txt",30,"14ops.txt", polyfit_deg=3, NN_epochs=1000)

In [None]:
!sudo python3 ai_feynman_varsWithNoise.py