<a href="https://colab.research.google.com/github/annikaaross/Homochirality-project/blob/lio/History_reader.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Imports and mounting

In [16]:
import pandas as pd
from google.colab import widgets
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
from glob import glob
import statsmodels.api as sm
import 

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


# Getting data

**The steps for getting data**

Batch:
1. Identify the folder your results are in
2. Pull in all the files in that folder
3. Parse the filenames
4. Get certain statistics on each run and flag any interesting ones
5. Plot batch trends

Individual:
1. Get the file path
2. Parse the filename
3. Get run stats
4. Plot standard run plots

In [3]:
# Ok, we need to get the parameters from these filenames.

 
def get_pars(path):
  chunks = path.split("/")[-1].split(".")
  if chunks[0][0] == "_":
    chunks[0] = chunks[0][1:]
  pool, itns = tuple([int(c) for c in chunks[0].split("x")])
  bbbp = int(chunks[1])/100
  lf = int(chunks[2])/100
  n = int(chunks[3])
  pc = int(chunks[4])/100
  bp = int(chunks[5])/100
  method = chunks[6]
  if method == "b":
    hbfl = chunks[7]/100
    hbfr = chunks[8]/100
    hnifl = chunks[9]/100
    hnifr = chunks[10]/100
  elif method == "s":
    hbf = int(chunks[7])/100
    hnif = int(chunks[8])/100
  elif method == "p":
    lbd = chunks[7]
    pf = chunks[8]
  else:
    raise ValueError("Bad filename.")
  timestamp = chunks[-2]


  print(f"pool = {pool}\nitns = {itns}\nbbbp = {bbbp}\nlf = {lf}\nn = {n}\npc = {pc}\nbp = {bp}")
  print(f"method = {method}\nhbf = {hbf}\nhnif = {hnif}\ntimestamp = {timestamp}")



# Batch

Ok! Let's imagine what we might need to do to process a batch run.

In [4]:
# First get the folder.
folder = '/content/drive/Shared drives/Homochirality/output/test/'

# Use glob to get all the files.
for filename in glob(f"{folder}*.csv"):
  print(filename)
  # So this gives me the filenames. Now I can write a helper function to 
  # unpack the names, give me the data files, and store the parameters somewhere.
  get_pars(filename)
  # That just prints them for now. 

/content/drive/Shared drives/Homochirality/output/test/_30x50.100.050.40.033.033.s.000.100.1594929466.csv
pool = 30
itns = 50
bbbp = 1.0
lf = 0.5
n = 40
pc = 0.33
bp = 0.33
method = s
hbf = 0.0
hnif = 1.0
timestamp = 1594929466
/content/drive/Shared drives/Homochirality/output/test/_30x50.000.050.40.033.033.s.000.100.1594929490.csv
pool = 30
itns = 50
bbbp = 0.0
lf = 0.5
n = 40
pc = 0.33
bp = 0.33
method = s
hbf = 0.0
hnif = 1.0
timestamp = 1594929490
/content/drive/Shared drives/Homochirality/output/test/_30x50.100.050.40.033.033.s.100.000.1594929523.csv
pool = 30
itns = 50
bbbp = 1.0
lf = 0.5
n = 40
pc = 0.33
bp = 0.33
method = s
hbf = 1.0
hnif = 0.0
timestamp = 1594929523


# Individual

Let's look at what needs to be done for an individual file.

In [5]:
# First you need to full path to the file.
path = '/content/drive/Shared drives/Homochirality/output/test/_30x50.100.050.40.033.033.s.000.100.1594929466.csv'

# Once you have that, maybe you want to get the parameters.
# Or, hell, maybe you want to talk to your collaborator about encoding the parameters
# in the filenames. It looks like we're gonna have a lot more variables coming up,
# so who knows how many things we're gonna have to encode!

# Anyway, parameters aside, maybe you wnat to get some data about the run 
# in addition to plotting it?

# Things we might want to know:
  # The trend of %Homochirality vs Length
    # For now, just get the %Homochirality of the longest thing.
  # The trend of %Homochirality vs Age
    # Not sure how to measure this.
  # The length of the longest homochiral chain in the run?

stats = pd.read_csv(path)


## Plotting an individual run

In [56]:


plots = ["polyspread plotly","leftright chirality",
         "homochirality vs length","Homochirality vs age", 
         "homochirality over time","Homo_Hist","Stats"]
tb = widgets.TabBar(plots)



with tb.output_to("polyspread plotly"):
  df = stats
  counts = df.groupby("Iteration")['Signed ee'].value_counts().reset_index(name='count')
  fig = px.scatter(counts, x="Iteration", y="Signed ee", size='count')
  fig.show()


with tb.output_to("leftright chirality"):

  df = stats
  lefts = df.groupby("Iteration")["#LeftHomochiral"].sum().rename("LL")
  rights = df.groupby("Iteration")["#RightHomochiral"].sum().rename("RR")
  heteros = (df.groupby("Iteration")['Length'].sum() - df.groupby("Iteration")['Length'].count()).rename("LR")
  bondcounts = pd.DataFrame([lefts, rights, heteros]).transpose()
  bondcounts["Total"] = bondcounts["LL"] + bondcounts["RR"] + bondcounts["LR"]
  bondcounts = bondcounts.apply(lambda x : x / bondcounts["Total"])
  
  fig = go.Figure()
  fig.add_trace(go.Scatter(y=bondcounts["LL"],
                      mode='lines',
                      name='left homochiral'))
  fig.add_trace(go.Scatter(y=bondcounts["RR"],
                      mode='lines',
                      name='right homochiral'))
  fig.add_trace(go.Scatter(y=bondcounts["LR"],
                      mode='lines', name='heterochiral'))
  fig.update_layout(title='Proportion of LL, RR, and LR bonds by iteration',
                   xaxis_title='Iteration',
                   yaxis_title='Proportion')
  fig.update_yaxes(range=[0, 1])

  fig.show()

with tb.output_to("homochirality over time"):
  df = stats
  maxlen = df["Length"].max()

  fig = px.scatter(df, x="Length", y="%Homochirality",animation_frame="Iteration",
                   range_x=[-1,maxlen+1],range_y=[-0.01,1.2])
  fig.show()


with tb.output_to("homochirality vs length"):
  df = stats
  df = df.groupby("Length")['%Homochirality'].value_counts().reset_index(name='count')
  fig = px.scatter(df, x="Length",y="%Homochirality",size='count',range_y=(-0.1,1.1))
  fig.update_traces(marker=dict(line=dict(color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
  print(float(df[df["Length"]==max(df["Length"])]["%Homochirality"]))
  

  ###
  # Playing with graphing
  ###
  #fig.add_trace(go.Scatter(x=df["Length"],y=pars[0] + pars[1]/df["Length"]))


  # a = 5
  # b = 0.68

  # fig.add_trace(go.Scatter(x=df["Length"],y=a/df["Length"]+b))
  # fig.add_trace(go.Scatter(x=df["Length"],y=-a/df["Length"]+b))

  fig.show()



with tb.output_to("Homochirality vs age"):
  df = stats
  df = df.groupby("Age")['%Homochirality'].value_counts().reset_index(name='count')
  fig = px.scatter(df, x="Age",y="%Homochirality",size='count',trendline="ols")
  fig.update_traces(marker=dict(line=dict(color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
  fig.show()

with tb.output_to("Homo_Hist"):
  df = stats
  fig = px.histogram(df,x="%Homochirality")
  fig.show()
  fig2 = px.histogram(df,x="Length")
  fig2.show()


with tb.output_to("Stats"):
  df = stats[stats["Type"]=="Polymer"]
  print("Are older polymers longer?")
  mean = df.groupby("Age")["Length"].mean().reset_index(name='MeanLength')
  stdev = df.groupby("Age")["Length"].std().reset_index(name='SDLength')
  sdx = list(stdev["Age"])
  sdx_rev = list(sdx)[::-1]
  sdy_upper = [m+sd for m,sd in zip(mean["MeanLength"],stdev["SDLength"])]
  sdy_lower = [m-sd for m,sd in zip(mean["MeanLength"],stdev["SDLength"])][::-1]
  fig1 = px.scatter(df,x="Age",y="Length")
  fig1.add_trace(go.Scatter(
    x=sdx+sdx_rev,
    y=sdy_upper+sdy_lower,
    fill="toself",
    line_color="blue"))
  fig1.add_trace(go.Scatter(x=mean["Age"],y=mean["MeanLength"],line_color="black"))
  fig1.show()

  print("Are older polymers more homochiral?")


  mean = df.groupby("Age")["%Homochirality"].mean().reset_index(name='MeanHomo')
  stdev = df.groupby("Age")["%Homochirality"].std().reset_index(name='SDHomo')
  sdx = list(stdev["Age"])
  sdx_rev = list(sdx)[::-1]
  sdy_upper = [m+sd for m,sd in zip(mean["MeanHomo"],stdev["SDHomo"])]
  sdy_lower = [m-sd for m,sd in zip(mean["MeanHomo"],stdev["SDHomo"])][::-1]
  fig1 = px.scatter(df,x="Age",y="%Homochirality")
  fig1.add_trace(go.Scatter(
    x=sdx+sdx_rev,
    y=sdy_upper+sdy_lower,
    fill="toself",
    line_color="blue"))
  fig1.add_trace(go.Scatter(x=mean["Age"],y=mean["MeanHomo"],line_color="black"))
  fig1.show()






  print("Are longer polymers more homochiral?")


  mean = df.groupby("Length")["%Homochirality"].mean().reset_index(name='MeanHomo')
  stdev = df.groupby("Length")["%Homochirality"].std().reset_index(name='SDHomo')
  sdx = list(stdev["Length"])
  sdx_rev = list(sdx)[::-1]
  sdy_upper = [m+sd for m,sd in zip(mean["MeanHomo"],stdev["SDHomo"])]
  sdy_lower = [m-sd for m,sd in zip(mean["MeanHomo"],stdev["SDHomo"])][::-1]
  fig1 = px.scatter(df,x="Length",y="%Homochirality")
  fig1.add_trace(go.Scatter(
    x=sdx+sdx_rev,
    y=sdy_upper+sdy_lower,
    fill="toself",
    line_color="blue",
    connectgaps=True))
  fig1.add_trace(go.Scatter(x=mean["Length"],y=mean["MeanHomo"],line_color="black"))
  fig1.show()

  print("Do more homochiral polymers have longer homochiral chains?")
  print("Do older polymers have longer homochiral chains?")
  print("Do longer polymers have longer homochiral chains?")



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

0.7590361445783133


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Are older polymers longer?


Are older polymers more homochiral?


Are longer polymers more homochiral?


Do more homochiral polymers have longer homochiral chains?
Do older polymers have longer homochiral chains?
Do longer polymers have longer homochiral chains?


<IPython.core.display.Javascript object>

Ok! Go you. Nice job. Pat on the back. You're very successful.