## Exercise : Extracting data from OFET-DB to perform ML

Its great that we have data stored in a standardized manner in our database but now we need to extract data from this database in the form of a table containing descriptors [X] and property [y] to perform ML. This notebook will focus on how we can do that. 

First i want you to use the backup file called 20231206_ofetdb_v2_backup11 and add it to your pgadmin as a practice database. We will avoid working with the original database.

In [36]:
# Connect to the database


import psycopg2
import pandas as pd
import numpy as np
import plotly.express as px


#sample connection details
# pgparams = {
#     "host": "127.0.0.1",
#     "database": "ofetdb_testenv",
#     "user": "postgres",
#     "password": "password",
#     "port": "5432",
# }


# Set max number of displayed columns and rows in Jupyter Notebook
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


pgparams = {
    "host": "127.0.0.1",
    "database": "ofetdb_testML",
    "user": "postgres",
    "password": "password",
    "port": "5432",
}

def read_select_query(query):

    with psycopg2.connect(**pgparams) as conn:

        df = pd.read_sql_query(query, conn)

    return df


## Simplest Scenario :

- assume only single polymer scenarios, no blends first (wt_frac = 1)
- assume only single solvent scenarios, no multiple solvents (vol_frac = 1)
- show device substrate information 
- show film deposition information (spin, blade, etc). We will not use the parameters for now
- dont go into detail of solution treatment, substrate pretreat and post process. Just show if treatment was performed
- show hole mobility information

Follow the code blocks below and we will eventually end up with a table containing this information 

1. Prepare a dataframe containing the sample_Id, citation_type and meta information from the experimental table (This one is done as a practice for you)

In [37]:
## Adding all the experiment information 

# SQL query to fetch the required data
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

# Display the resulting DataFrame
#print(result_df)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."


remember to keep adding to the existing query to add more information to the table. It is going to get complicated and long soon FYI. 

2. Now to this database add the solution concentration information

In [38]:
## only way to join is through ofet_process

query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id ;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0


3. Now to this table add the solvent information but only consider devices made from single solvent vol frac = 1

In [39]:
## only way to join is through ofet_process

query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        sms.vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    WHERE 
        sms.vol_frac = 1;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,vol_frac,solvent_pubchem_cid,solvent_iupac_name
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform


4. Now to this table add polymer information only for devices made from one polymer (no blends) (wt-frac = 1)

In [42]:
## only way to join is through ofet_process

query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        sms.vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name,
        smp.wt_frac
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN 
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    WHERE 
        sms.vol_frac=1 and smp.wt_frac=1;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,vol_frac,solvent_pubchem_cid,solvent_iupac_name,wt_frac
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,1.0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,1.0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform,1.0
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,1.0
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,1.0
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,1.0,7964,chlorobenzene,1.0
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,1.0,7239,"1,2-dichlorobenzene",1.0
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,1.0,6212,chloroform,1.0
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,1.0,7239,"1,2-dichlorobenzene",1.0
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,1.0,7239,"1,2-dichlorobenzene",1.0


5. Now to this table add the device substrate information 

In [43]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        sol.concentration,
        sms.vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name,
        smp.wt_frac,
        dev.params
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN 
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    LEFT JOIN
        OFET_PROCESS of ON s.process_id =of.process_id
    LEFT JOIN
        DEVICE_FABRICATION dev ON dev.device_fab_id = of.device_fab_id
    WHERE 
        sms.vol_frac=1 and smp.wt_frac=1;

'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,concentration.1,vol_frac,solvent_pubchem_cid,solvent_iupac_name,wt_frac,params
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,5.0,1.0,6212,chloroform,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,4.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,3.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-..."
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,7.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-..."
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,6.5,1.0,6212,chloroform,1.0,"{'channel_width': 1400.0, 'gate_material': 'n-..."


6. Now to this table add the film deposition type only not the parameters associated with it

In [45]:
query = '''
  SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        sol.concentration,
        sms.vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name,
        smp.wt_frac,
        dev.params,
        fd.deposition_type
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN 
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    LEFT JOIN
        OFET_PROCESS of ON s.process_id =of.process_id
    LEFT JOIN
        DEVICE_FABRICATION dev ON dev.device_fab_id = of.device_fab_id
    LEFT JOIN
        FILM_DEPOSITION fd ON of.process_id = fd.film_deposition_id
    WHERE 
        sms.vol_frac=1 and smp.wt_frac=1;

'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,concentration.1,vol_frac,solvent_pubchem_cid,solvent_iupac_name,wt_frac,params,deposition_type
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,5.0,1.0,6212,chloroform,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,4.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,3.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,7.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,6.5,1.0,6212,chloroform,1.0,"{'channel_width': 1400.0, 'gate_material': 'n-...",spin


7. Now to this table add the solution treatment, substrate preatreat and post process informaiton 

1 if treatment is done and 0 if no treatment



In [47]:
query = '''
      SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        sol.concentration,
        sms.vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name,
        smp.wt_frac,
        dev.params,
        fd.deposition_type,
    CASE
        WHEN ststep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS SOLUTION_TREATMENT,
    CASE
        WHEN spstep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS SUBSTRATE_PRETREAT,
    CASE
        WHEN ppstep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS POST_PROCESS
    
FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN 
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    LEFT JOIN
        OFET_PROCESS of ON s.process_id =of.process_id
    LEFT JOIN
        DEVICE_FABRICATION dev ON dev.device_fab_id = of.device_fab_id
    LEFT JOIN
        FILM_DEPOSITION fd ON of.process_id = fd.film_deposition_id
    LEFT JOIN
        SOLUTION_TREATMENT st ON st.solution_treatment_id = of.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto ON sto.solution_treatment_id =st.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP ststep ON ststep.solution_treatment_step_id = sto.solution_treatment_step_id
    LEFT JOIN
       SUBSTRATE_PRETREAT sp ON sp.substrate_pretreat_id = of.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_ORDER spo ON spo.substrate_pretreat_id = sp.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP spstep ON spstep.substrate_pretreat_step_id = spo.substrate_pretreat_step_id
    LEFT JOIN
        POSTPROCESS pp ON pp.postprocess_id = of.postprocess_id 
    LEFT JOIN
        POSTPROCESS_ORDER ppo ON ppo.postprocess_id = pp.postprocess_id
    LEFT JOIN
        POSTPROCESS_STEP ppstep ON ppstep.postprocess_step_id = ppo.postprocess_step_id
    WHERE 
        sms.vol_frac=1 and smp.wt_frac=1;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,concentration.1,vol_frac,solvent_pubchem_cid,solvent_iupac_name,wt_frac,params,deposition_type,solution_treatment,substrate_pretreat,post_process
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,0,1
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,4.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1,1
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,5.0,1.0,6212,chloroform,1.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1,1
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade,0,0,1
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade,0,0,1
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,5.0,1.0,7964,chlorobenzene,1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,0,1
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,4.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,0,1
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,3.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1,1
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,7.0,1.0,7239,"1,2-dichlorobenzene",1.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1,1
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,6.5,1.0,6212,chloroform,1.0,"{'channel_width': 1400.0, 'gate_material': 'n-...",spin,0,1,1


8. Now to this table add the hole mobility information 

only keep devices that have an actual hole mobility value and is not Null or NAN

In [51]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration as solution_concentration,
        sms.vol_frac as solvent_vol_frac,
        sv.pubchem_cid as solvent_pubchem_cid,
        sv.iupac_name as solvent_iupac_name,
        p.common_name as polymer_common_name,
        p.iupac_name as polymer_iupac_name,
        p.mw as polymer_mw,
        p.mn as polymer_mn,
        p.dispersity as polymer_dispersity,
        dev.params as device_substrate_parameters,
        fd.deposition_type as film_deposition_type,
        fd.params as film_deposition_params,
        
    CASE
        WHEN ststep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS SOLUTION_TREATMENT,
    CASE
        WHEN spstep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS SUBSTRATE_PRETREAT,
    CASE
        WHEN ppstep.treatment_type is not NULL then '1'
        ELSE '0'
    END AS POST_PROCESS,
    CAST(meas.data-> 'hole_mobility'->>'value' AS float) AS hole_mobility
    
    FROM
        SAMPLE s
    JOIN
        EXPERIMENT_INFO e ON s.exp_id = e.exp_id
    LEFT JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    LEFT JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    LEFT JOIN 
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    LEFT JOIN
        POLYMER p ON smp.polymer_id = p.polymer_id
    LEFT JOIN
        SOLVENT sv ON sms.solvent_id = sv.pubchem_cid
    LEFT JOIN
        OFET_PROCESS of ON s.process_id =of.process_id
    LEFT JOIN
        DEVICE_FABRICATION dev ON dev.device_fab_id = of.device_fab_id
    LEFT JOIN
        FILM_DEPOSITION fd ON of.process_id = fd.film_deposition_id
    LEFT JOIN
        SOLUTION_TREATMENT st ON st.solution_treatment_id = of.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto ON sto.solution_treatment_id =st.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP ststep ON ststep.solution_treatment_step_id = sto.solution_treatment_step_id
    LEFT JOIN
       SUBSTRATE_PRETREAT sp ON sp.substrate_pretreat_id = of.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_ORDER spo ON spo.substrate_pretreat_id = sp.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP spstep ON spstep.substrate_pretreat_step_id = spo.substrate_pretreat_step_id
    LEFT JOIN
        POSTPROCESS pp ON pp.postprocess_id = of.postprocess_id 
    LEFT JOIN
        POSTPROCESS_ORDER ppo ON ppo.postprocess_id = pp.postprocess_id
    LEFT JOIN
        POSTPROCESS_STEP ppstep ON ppstep.postprocess_step_id = ppo.postprocess_step_id
    LEFT JOIN
        MEASUREMENT meas ON s.sample_id = meas.sample_id
        
    WHERE 
        sms.vol_frac=1 
        AND smp.wt_frac=1
        AND (meas.data->> 'hole_mobility' IS NOT NULL OR meas.data->> 'hole_mobility' = 'NAN');
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

result_df

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,citation_type,experiment_meta,solution_concentration,solvent_vol_frac,solvent_pubchem_cid,solvent_iupac_name,polymer_common_name,polymer_iupac_name,polymer_mw,polymer_mn,polymer_dispersity,device_substrate_parameters,film_deposition_type,film_deposition_params,solution_treatment,substrate_pretreat,post_process,hole_mobility
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,"{'spin_rate': 1000.0, 'environment': 'air'}",0,0,1,0.11
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,"{'spin_rate': 1500.0, 'spin_time': 60.0, 'envi...",0,1,1,0.29
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",292.2,74.9,3.9,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,{'environment': 'air'},0,1,1,0.23
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,0.81
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",344.0,71.0,4.85,"{'channel_width': 2000.0, 'gate_material': 'n-...",blade,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,1.53
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",501.0,110.0,4.55,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,"{'spin_rate': 3000.0, 'spin_time': 60.0, 'envi...",0,0,1,0.9
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",199.0,55.0,3.62,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,"{'spin_rate': 1000.0, 'spin_time': 60.0}",0,0,1,1.1
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,,blade,"{'blade_angle': 90.0, 'environment': 'air', 't...",0,0,1,8.5
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,"{'spin_rate': 2000.0, 'spin_time': 60.0, 'envi...",0,1,1,6.85
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,"{'spin_rate': 12000.0, 'spin_time': 60.0, 'env...",0,1,1,7.25


This is a simple version of the dataframe result_df. There are a couple more things we can now do to the dataframe to make it more machine readable. We no longer need to use sql queries. can just work with the dataframe and pandas

9. The code block below will unpack the data stored in device_substrate_parameters and store it as columns (this one is done for you)

In [56]:
 #unpacking the information stored in device_substrate_parameters

import pandas as pd
from pandas import json_normalize


# # Extract the 'device_substrate_parameters' column and normalize it
params_df = json_normalize(result_df['device_substrate_parameters'])

# # Concatenate the original DataFrame with the new 'params_df'
result_df = pd.concat([result_df, params_df], axis=1)

# # Drop the original 'device_substrate_parameters' column
result_df_norm = result_df.drop('device_substrate_parameters', axis=1)

# # Display the resulting DataFrame
result_df_norm

Unnamed: 0,sample_id,citation_type,experiment_meta,solution_concentration,solvent_vol_frac,solvent_pubchem_cid,solvent_iupac_name,polymer_common_name,polymer_iupac_name,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_type,film_deposition_params,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,gate_material,channel_length,dielectric_material,dielectric_thickness,electrode_configuration,dielectric_material_2,dielectric_thickness_2,channel_width.1,gate_material.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,electrode_configuration.1,dielectric_material_2.1,dielectric_thickness_2.1
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,spin,"{'spin_rate': 1000.0, 'environment': 'air'}",0,0,1,0.11,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,spin,"{'spin_rate': 1500.0, 'spin_time': 60.0, 'envi...",0,1,1,0.29,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",292.2,74.9,3.9,spin,{'environment': 'air'},0,1,1,0.23,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,blade,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,0.81,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",344.0,71.0,4.85,blade,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,1.53,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",501.0,110.0,4.55,spin,"{'spin_rate': 3000.0, 'spin_time': 60.0, 'envi...",0,0,1,0.9,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",199.0,55.0,3.62,spin,"{'spin_rate': 1000.0, 'spin_time': 60.0}",0,0,1,1.1,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,blade,"{'blade_angle': 90.0, 'environment': 'air', 't...",0,0,1,8.5,,,,,,,,,,,,,,,,
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,spin,"{'spin_rate': 2000.0, 'spin_time': 60.0, 'envi...",0,1,1,6.85,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,spin,"{'spin_rate': 12000.0, 'spin_time': 60.0, 'env...",0,1,1,7.25,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,


In [58]:

# Drop unnecessary columns that won't be used for modeling
columns_to_drop = ['film_deposition_params', 'citation_type', 'experiment_meta', 'solvent_vol_frac', 'solvent_iupac_name', 'polymer_iupac_name']
result_df = result_df_norm.drop(columns=columns_to_drop)

# Display the resulting DataFrame
result_df


Unnamed: 0,sample_id,solution_concentration,solvent_pubchem_cid,polymer_common_name,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_type,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,gate_material,channel_length,dielectric_material,dielectric_thickness,electrode_configuration,dielectric_material_2,dielectric_thickness_2,channel_width.1,gate_material.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,electrode_configuration.1,dielectric_material_2.1,dielectric_thickness_2.1
0,1,4.0,7964,DPP-DTT,299.0,90.0,3.32,spin,0,0,1,0.11,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
1,2,4.0,7964,DPP-DTT,299.0,90.0,3.32,spin,0,1,1,0.29,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
2,3,5.0,6212,DPP-DTT,292.2,74.9,3.9,spin,0,1,1,0.23,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,
3,12,5.0,7964,DPP-DTT,91.0,29.0,3.14,blade,0,0,1,0.81,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
4,13,5.0,7964,DPP-DTT,344.0,71.0,4.85,blade,0,0,1,1.53,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
5,14,5.0,7964,DPP-DTT,501.0,110.0,4.55,spin,0,0,1,0.9,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
6,15,4.0,7239,DPP-DTT,199.0,55.0,3.62,spin,0,0,1,1.1,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,
7,23,6.5,6212,DPP-DTT,91.0,29.0,3.14,blade,0,0,1,8.5,,,,,,,,,,,,,,,,
8,24,3.0,7239,DPP-DTT,290.0,143.0,2.03,spin,0,1,1,6.85,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,
9,25,7.0,7239,DPP-DTT,290.0,143.0,2.03,spin,0,1,1,7.25,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,


We are now going to do something called one hot encoding to convert columns with textual information into numbers (1 and 0)

In [60]:
# List of columns to one-hot encode
#columns_to_one_hot_encode = ['film_deposition_type', 'gate_material', 'dielectric_material', 'electrode_configuration']
columns_to_one_hot_encode = ['film_deposition_type', 'gate_material', 'electrode_configuration']

# Perform one-hot encoding
result_df_hot = pd.get_dummies(result_df_norm, columns=columns_to_one_hot_encode)

# Display the resulting DataFrame with one-hot encoding
result_df_hot


Unnamed: 0,sample_id,citation_type,experiment_meta,solution_concentration,solvent_vol_frac,solvent_pubchem_cid,solvent_iupac_name,polymer_common_name,polymer_iupac_name,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_params,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,channel_length,dielectric_material,dielectric_thickness,dielectric_material_2,dielectric_thickness_2,channel_width.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,dielectric_material_2.1,dielectric_thickness_2.1,film_deposition_type_blade,film_deposition_type_dip,film_deposition_type_drop,film_deposition_type_shear,film_deposition_type_spin,gate_material_Al,gate_material_Au,gate_material_PET,gate_material_glass,gate_material_n-doped Si,gate_material_Al.1,gate_material_Au.1,gate_material_PET.1,gate_material_glass.1,gate_material_n-doped Si.1,electrode_configuration_BGBC,electrode_configuration_BGTC,electrode_configuration_TGBC,electrode_configuration_BGBC.1,electrode_configuration_BGTC.1,electrode_configuration_TGBC.1
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'spin_rate': 1000.0, 'environment': 'air'}",0,0,1,0.11,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'spin_rate': 1500.0, 'spin_time': 60.0, 'envi...",0,1,1,0.29,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",292.2,74.9,3.9,{'environment': 'air'},0,1,1,0.23,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,0.81,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",344.0,71.0,4.85,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,1.53,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",501.0,110.0,4.55,"{'spin_rate': 3000.0, 'spin_time': 60.0, 'envi...",0,0,1,0.9,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",199.0,55.0,3.62,"{'spin_rate': 1000.0, 'spin_time': 60.0}",0,0,1,1.1,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,"{'blade_angle': 90.0, 'environment': 'air', 't...",0,0,1,8.5,,,,,,,,,,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'spin_rate': 2000.0, 'spin_time': 60.0, 'envi...",0,1,1,6.85,4000.0,100.0,SiO2,200.0,,,4000.0,100.0,SiO2,200.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'spin_rate': 12000.0, 'spin_time': 60.0, 'env...",0,1,1,7.25,4000.0,125.0,SiO2,200.0,,,4000.0,125.0,SiO2,200.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0


Now lets see all the columns in this database 

In [61]:
column_names = result_df_hot.columns
print(column_names)


Index(['sample_id', 'citation_type', 'experiment_meta',
       'solution_concentration', 'solvent_vol_frac', 'solvent_pubchem_cid',
       'solvent_iupac_name', 'polymer_common_name', 'polymer_iupac_name',
       'polymer_mw', 'polymer_mn', 'polymer_dispersity',
       'film_deposition_params', 'solution_treatment', 'substrate_pretreat',
       'post_process', 'hole_mobility', 'channel_width', 'channel_length',
       'dielectric_material', 'dielectric_thickness', 'dielectric_material_2',
       'dielectric_thickness_2', 'channel_width', 'channel_length',
       'dielectric_material', 'dielectric_thickness', 'dielectric_material_2',
       'dielectric_thickness_2', 'film_deposition_type_blade',
       'film_deposition_type_dip', 'film_deposition_type_drop',
       'film_deposition_type_shear', 'film_deposition_type_spin',
       'gate_material_Al', 'gate_material_Au', 'gate_material_PET',
       'gate_material_glass', 'gate_material_n-doped Si', 'gate_material_Al',
       'gate_materia

### Consolidating Descriptors

We are going to consolidate some of the descriptors into one column. 


Coating :

* film_deposition_type_MGC (dip,Dip,blade, inkjet, shear, wire) - value of 1 if any of these columns are true or else 0
* film_deposition_type_spin
* film_deposition_type_drop

Gate Material :

* gate_material_n_doped Si = ('gate_material_n-doped Si', 'gate_material_Si','gate_material_p-doped Si') 

* gate_material_other = ('gate_material_Al', 'gate_material_Au', 'gate_material_PEDOT:PSS', 'gate_material_PET','gate_material_glass')


Dielectric Material :

* dielectric_material_SiO2
* dielectric_material_other = (
        'dielectric_material_6FDA-DABC',
       'dielectric_material_CYTOP', 'dielectric_material_PAN',
       'dielectric_material_PMMA', 'dielectric_material_PTrFE',
       'dielectric_material_PVP', 'dielectric_material_Shellac',
       'dielectric_material_Si3N4')




In [62]:
result_df_hot

Unnamed: 0,sample_id,citation_type,experiment_meta,solution_concentration,solvent_vol_frac,solvent_pubchem_cid,solvent_iupac_name,polymer_common_name,polymer_iupac_name,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_params,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,channel_length,dielectric_material,dielectric_thickness,dielectric_material_2,dielectric_thickness_2,channel_width.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,dielectric_material_2.1,dielectric_thickness_2.1,film_deposition_type_blade,film_deposition_type_dip,film_deposition_type_drop,film_deposition_type_shear,film_deposition_type_spin,gate_material_Al,gate_material_Au,gate_material_PET,gate_material_glass,gate_material_n-doped Si,gate_material_Al.1,gate_material_Au.1,gate_material_PET.1,gate_material_glass.1,gate_material_n-doped Si.1,electrode_configuration_BGBC,electrode_configuration_BGTC,electrode_configuration_TGBC,electrode_configuration_BGBC.1,electrode_configuration_BGTC.1,electrode_configuration_TGBC.1
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'spin_rate': 1000.0, 'environment': 'air'}",0,0,1,0.11,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",299.0,90.0,3.32,"{'spin_rate': 1500.0, 'spin_time': 60.0, 'envi...",0,1,1,0.29,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",292.2,74.9,3.9,{'environment': 'air'},0,1,1,0.23,1500.0,80.0,SiO2,300.0,,,1500.0,80.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,0.81,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",344.0,71.0,4.85,"{'blade_angle': 8.0, 'environment': 'air', 'te...",0,0,1,1.53,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,1.0,7964,chlorobenzene,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",501.0,110.0,4.55,"{'spin_rate': 3000.0, 'spin_time': 60.0, 'envi...",0,0,1,0.9,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",199.0,55.0,3.62,"{'spin_rate': 1000.0, 'spin_time': 60.0}",0,0,1,1.1,2000.0,50.0,SiO2,300.0,,,2000.0,50.0,SiO2,300.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,1.0,6212,chloroform,DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",91.0,29.0,3.14,"{'blade_angle': 90.0, 'environment': 'air', 't...",0,0,1,8.5,,,,,,,,,,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'spin_rate': 2000.0, 'spin_time': 60.0, 'envi...",0,1,1,6.85,4000.0,100.0,SiO2,200.0,,,4000.0,100.0,SiO2,200.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,1.0,7239,"1,2-dichlorobenzene",DPP-DTT,"poly[2,5-(2-octyldodecyl)-3,6-diketopyrrolopyr...",290.0,143.0,2.03,"{'spin_rate': 12000.0, 'spin_time': 60.0, 'env...",0,1,1,7.25,4000.0,125.0,SiO2,200.0,,,4000.0,125.0,SiO2,200.0,,,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0


In [63]:
## replacing the pubchem_cid with solvent boiling point

# Get unique PubChem CIDs from the 'solvent_pubchem_cid' column
unique_pubchem_cids = result_df['solvent_pubchem_cid'].unique()

# Display the unique PubChem CIDs
print(unique_pubchem_cids)



[ 7964  6212  7239  6591  1140  7809  7003  7947    13 13229  8030  6344
  7503]


In [64]:
# Dictionary mapping PubChem CIDs to boiling points
boiling_point_dict = {
    7964: 132,
    6212: 62,
    7239: 180.1,
    6591: 146,
    7809: 138,
    13229: 238,
    13: 213,
    8030: 84,
    1140: 111,
    7501: 145,
    241: 80,
    6344: 40,
    7503: 179
}

# Add a new column "solvent_boiling_point" based on PubChem CIDs
result_df['solvent_boiling_point'] = result_df['solvent_pubchem_cid'].map(boiling_point_dict)


# Drop unnecessary columns that won't be used for modeling
columns_to_drop = ['solvent_pubchem_cid']
result_df = result_df.drop(columns=columns_to_drop)

# Display the resulting DataFrame
result_df


Unnamed: 0,sample_id,solution_concentration,polymer_common_name,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_type,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,gate_material,channel_length,dielectric_material,dielectric_thickness,electrode_configuration,dielectric_material_2,dielectric_thickness_2,channel_width.1,gate_material.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,electrode_configuration.1,dielectric_material_2.1,dielectric_thickness_2.1,solvent_boiling_point
0,1,4.0,DPP-DTT,299.0,90.0,3.32,spin,0,0,1,0.11,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,132.0
1,2,4.0,DPP-DTT,299.0,90.0,3.32,spin,0,1,1,0.29,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,132.0
2,3,5.0,DPP-DTT,292.2,74.9,3.9,spin,0,1,1,0.23,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,62.0
3,12,5.0,DPP-DTT,91.0,29.0,3.14,blade,0,0,1,0.81,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
4,13,5.0,DPP-DTT,344.0,71.0,4.85,blade,0,0,1,1.53,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
5,14,5.0,DPP-DTT,501.0,110.0,4.55,spin,0,0,1,0.9,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
6,15,4.0,DPP-DTT,199.0,55.0,3.62,spin,0,0,1,1.1,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,180.1
7,23,6.5,DPP-DTT,91.0,29.0,3.14,blade,0,0,1,8.5,,,,,,,,,,,,,,,,,62.0
8,24,3.0,DPP-DTT,290.0,143.0,2.03,spin,0,1,1,6.85,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,180.1
9,25,7.0,DPP-DTT,290.0,143.0,2.03,spin,0,1,1,7.25,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,180.1


In [65]:
column_names = result_df.columns
print(column_names)

Index(['sample_id', 'solution_concentration', 'polymer_common_name',
       'polymer_mw', 'polymer_mn', 'polymer_dispersity',
       'film_deposition_type', 'solution_treatment', 'substrate_pretreat',
       'post_process', 'hole_mobility', 'channel_width', 'gate_material',
       'channel_length', 'dielectric_material', 'dielectric_thickness',
       'electrode_configuration', 'dielectric_material_2',
       'dielectric_thickness_2', 'channel_width', 'gate_material',
       'channel_length', 'dielectric_material', 'dielectric_thickness',
       'electrode_configuration', 'dielectric_material_2',
       'dielectric_thickness_2', 'solvent_boiling_point'],
      dtype='object')


In [66]:
result_df_P3HT = result_df[result_df['polymer_common_name'] == 'P3HT']
result_df_DPP_DTT = result_df[result_df['polymer_common_name'] != 'P3HT']


### P3HT Modeling

In [67]:
result_df_P3HT = result_df_P3HT.drop(columns='polymer_common_name')
result_df_P3HT

Unnamed: 0,sample_id,solution_concentration,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_type,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,gate_material,channel_length,dielectric_material,dielectric_thickness,electrode_configuration,dielectric_material_2,dielectric_thickness_2,channel_width.1,gate_material.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,electrode_configuration.1,dielectric_material_2.1,dielectric_thickness_2.1,solvent_boiling_point
97,190,10.0,42.0,23.3,1.8,,0,0,1,0.001,500.0,Au,50.0,PMMA,650.0,TGBC,,,500.0,Au,50.0,PMMA,650.0,TGBC,,,213.0
109,169,4.0,47.7,24.0,1.9875,,0,0,1,0.072672,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,62.0
110,170,3.5,47.7,24.0,1.9875,,0,0,1,0.063779,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,62.0
111,171,3.5,23.0,12.7,1.811024,,0,0,1,0.027642,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,62.0
112,172,10.0,,65.5,,,0,0,1,0.046912,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,62.0
113,173,10.0,,37.0,,,0,1,1,0.011943,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,62.0
114,174,10.0,,37.0,,,0,1,1,0.04212,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,138.0
115,175,10.0,,37.0,,,0,1,1,0.148004,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,238.0
116,176,10.0,,37.0,,,0,1,1,0.111808,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,213.0
117,177,10.0,,37.0,,,0,1,1,0.163626,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,84.0


In [68]:
num_rows, num_columns = result_df_P3HT.shape
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_columns}")


Number of rows: 79
Number of columns: 27


### DPP-DTT Modeling

In [69]:
result_df_DPP_DTT = result_df_DPP_DTT.drop(columns='polymer_common_name')
result_df_DPP_DTT

Unnamed: 0,sample_id,solution_concentration,polymer_mw,polymer_mn,polymer_dispersity,film_deposition_type,solution_treatment,substrate_pretreat,post_process,hole_mobility,channel_width,gate_material,channel_length,dielectric_material,dielectric_thickness,electrode_configuration,dielectric_material_2,dielectric_thickness_2,channel_width.1,gate_material.1,channel_length.1,dielectric_material.1,dielectric_thickness.1,electrode_configuration.1,dielectric_material_2.1,dielectric_thickness_2.1,solvent_boiling_point
0,1,4.0,299.0,90.0,3.32,spin,0,0,1,0.11,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,132.0
1,2,4.0,299.0,90.0,3.32,spin,0,1,1,0.29,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,132.0
2,3,5.0,292.2,74.9,3.9,spin,0,1,1,0.23,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,1500.0,n-doped Si,80.0,SiO2,300.0,BGTC,,,62.0
3,12,5.0,91.0,29.0,3.14,blade,0,0,1,0.81,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
4,13,5.0,344.0,71.0,4.85,blade,0,0,1,1.53,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
5,14,5.0,501.0,110.0,4.55,spin,0,0,1,0.9,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,132.0
6,15,4.0,199.0,55.0,3.62,spin,0,0,1,1.1,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,2000.0,n-doped Si,50.0,SiO2,300.0,BGBC,,,180.1
7,23,6.5,91.0,29.0,3.14,blade,0,0,1,8.5,,,,,,,,,,,,,,,,,62.0
8,24,3.0,290.0,143.0,2.03,spin,0,1,1,6.85,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,100.0,SiO2,200.0,BGTC,,,180.1
9,25,7.0,290.0,143.0,2.03,spin,0,1,1,7.25,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,4000.0,n-doped Si,125.0,SiO2,200.0,BGTC,,,180.1


In [70]:
num_rows, num_columns = result_df_DPP_DTT.shape
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_columns}")


Number of rows: 139
Number of columns: 27


In [None]:
###EXTRACTION PART2
# make each solution processing, substarte pretreat and post processing type into individual column
# have the step number in each type. if done mixing first then 1 second step should have 2

In [None]:
# 1. determine what process/treatment is possible in each(start with 1 then expand)
# 2. make a table for each procee/treatment
# 3. input a 1 or 2 or etc for the step each treatment was done, 0 if not done

In [95]:
query = '''
    SELECT
        s.sample_id,
        spo.process_order AS substrate_pretreat_order,
        spstep.treatment_type AS substrate_pretreat_type,
        sto.process_order AS solution_treatment_order,
        ststep.treatment_type AS solution_treatment_type,
        ppo.process_order AS post_process_order,
        ppstep.treatment_type AS post_process_type
    FROM
        SAMPLE s
    JOIN
        OFET_PROCESS of ON s.process_id = of.process_id
    LEFT JOIN 
        SUBSTRATE_PRETREAT_ORDER spo ON of.substrate_pretreat_id = spo.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP spstep ON spstep.substrate_pretreat_step_id = spo.substrate_pretreat_step_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto ON of.solution_treatment_id = sto.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP ststep ON sto.solution_treatment_step_id = ststep.solution_treatment_step_id
    LEFT JOIN
        POSTPROCESS_ORDER ppo ON of.postprocess_id = ppo.postprocess_id
    LEFT JOIN
        POSTPROCESS_STEP ppstep ON ppo.postprocess_step_id = ppstep.postprocess_step_id
    '''

#join order and step through step id

result_df_order = read_select_query(query)

result_df_order

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,substrate_pretreat_order,substrate_pretreat_type,solution_treatment_order,solution_treatment_type,post_process_order,post_process_type
0,1,,,,,1.0,annealing
1,2,1.0,sam,,,1.0,annealing
2,3,1.0,sam,,,1.0,annealing
3,4,1.0,sam,,,1.0,annealing
4,5,1.0,sam,,,1.0,annealing
5,6,2.0,sam,,,1.0,annealing
6,6,1.0,sam,,,1.0,annealing
7,7,2.0,sam,,,1.0,annealing
8,7,1.0,sam,,,1.0,annealing
9,8,2.0,sam,,,1.0,annealing


In [98]:
query = '''
   SELECT
    s.sample_id,
    COALESCE(MAX(CASE WHEN spstep.treatment_type = 'sam' THEN spo.process_order END), 0) AS substrate_pretreat_sam,
    COALESCE(MAX(CASE WHEN spstep.treatment_type = 'plasma' THEN spo.process_order END), 0) AS substrate_pretreat_plasma,
    COALESCE(MAX(CASE WHEN spstep.treatment_type = 'uv_ozone' THEN spo.process_order END), 0) AS substrate_pretreat_uv_ozone,
    
    COALESCE(MAX(CASE WHEN ststep.treatment_type = 'poor_solvent' THEN sto.process_order END), 0) AS solution_treatment_poor_solvent,
    COALESCE(MAX(CASE WHEN ststep.treatment_type = 'aging' THEN sto.process_order END), 0) AS solution_treatment_aging,
    COALESCE(MAX(CASE WHEN ststep.treatment_type = 'sonication' THEN sto.process_order END), 0) AS solution_treatment_sonication,
    COALESCE(MAX(CASE WHEN ststep.treatment_type = 'mixing' THEN sto.process_order END), 0) AS solution_treatment_mixing,
    COALESCE(MAX(CASE WHEN ststep.treatment_type = 'uv_irradiation' THEN sto.process_order END), 0) AS solution_treatment_uv_irradiation,
    
    COALESCE(MAX(CASE WHEN ppstep.treatment_type = 'annealing' THEN ppo.process_order END), 0) AS post_process_annealing,
    COALESCE(MAX(CASE WHEN ppstep.treatment_type = 'drying' THEN ppo.process_order END), 0) AS post_process_drying,
    COALESCE(MAX(CASE WHEN ppstep.treatment_type = 'chemical_treat' THEN ppo.process_order END), 0) AS post_process_chemical



     FROM
        SAMPLE s
    JOIN
        OFET_PROCESS of ON s.process_id = of.process_id
    LEFT JOIN 
        SUBSTRATE_PRETREAT_ORDER spo ON of.substrate_pretreat_id = spo.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP spstep ON spstep.substrate_pretreat_step_id = spo.substrate_pretreat_step_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto ON of.solution_treatment_id = sto.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP ststep ON sto.solution_treatment_step_id = ststep.solution_treatment_step_id
    LEFT JOIN
        POSTPROCESS_ORDER ppo ON of.postprocess_id = ppo.postprocess_id
    LEFT JOIN
        POSTPROCESS_STEP ppstep ON ppo.postprocess_step_id = ppstep.postprocess_step_id
    GROUP BY
        s.sample_id
    ORDER BY
        s.sample_id ASC;


'''

# Use the read_select_query function to execute the query
result_df_order = read_select_query(query)

result_df_order    

  df = pd.read_sql_query(query, conn)


Unnamed: 0,sample_id,substrate_pretreat_sam,substrate_pretreat_plasma,substrate_pretreat_uv_ozone,solution_treatment_poor_solvent,solution_treatment_aging,solution_treatment_sonication,solution_treatment_mixing,solution_treatment_uv_irradiation,post_process_annealing,post_process_drying,post_process_chemical
0,1,0,0,0,0,0,0,0,0,1,0,0
1,2,1,0,0,0,0,0,0,0,1,0,0
2,3,1,0,0,0,0,0,0,0,1,0,0
3,4,1,0,0,0,0,0,0,0,1,0,0
4,5,1,0,0,0,0,0,0,0,1,0,0
5,6,2,0,0,0,0,0,0,0,1,0,0
6,7,2,0,0,0,0,0,0,0,1,0,0
7,8,2,0,0,0,0,0,0,0,1,0,0
8,9,2,0,0,0,0,0,0,0,1,0,0
9,10,2,0,0,0,0,0,0,0,1,0,0


In [None]:
##read tpot instruction and download on computer