## Exercise : Extracting data from OFET-DB to perform ML

Its great that we have data stored in a standardized manner in our database but now we need to extract data from this database in the form of a table containing descriptors [X] and property [y] to perform ML. This notebook will focus on how we can do that. 

First i want you to use the backup file called 20231206_ofetdb_v2_backup11 and add it to your pgadmin as a practice database. We will avoid working with the original database.

In [2]:
# Connect to the database


import psycopg2
import pandas as pd
import numpy as np
import plotly.express as px


#sample connection details
# pgparams = {
#     "host": "127.0.0.1",
#     "database": "ofetdb_testenv",
#     "user": "postgres",
#     "password": "password",
#     "port": "5432",
# }


# Set max number of displayed columns and rows in Jupyter Notebook
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


pgparams = {
    "host": "chbe-ofet-db.postgres.database.azure.com",
    "database": "ofetdb_v2_test",
    "user": "mg200_ofetdb",
    "password": "MGEROFETDB23!",
    "port": "5432",
}

def read_select_query(query):

    with psycopg2.connect(**pgparams) as conn:

        df = pd.read_sql_query(query, conn)

    return df


## Simplest Scenario :

- assume only single polymer scenarios, no blends first (wt_frac = 1)
- assume only single solvent scenarios, no multiple solvents (vol_frac = 1)
- show device substrate information 
- show film deposition information (spin, blade, etc). We will not use the parameters for now
- dont go into detail of solution treatment, substrate pretreat and post process. Just show if treatment was performed
- show hole mobility information

Follow the code blocks below and we will eventually end up with a table containing this information 

1. Prepare a dataframe containing the sample_Id, citation_type and meta information from the experimental table (This one is done as a practice for you)

In [3]:
## Adding all the experiment information 

# SQL query to fetch the required data
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id;
'''

# Use the read_select_query function to execute the query
result_df = read_select_query(query)

# Display the resulting DataFrame
#print(result_df)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ..."


remember to keep adding to the existing query to add more information to the table. It is going to get complicated and long soon FYI. 

2. Now to this database add the solution concentration information

In [4]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id;
'''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0


3. Now to this table add the solvent information but only consider devices made from single solvent vol frac = 1

In [5]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    WHERE
        sms.vol_frac=1;
'''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
3,4,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
4,5,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
5,6,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
6,7,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
7,8,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
8,9,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
9,10,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0


4. Now to this table add polymer information only for devices made from one polymer (no blends) (wt-frac = 1)

In [6]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    JOIN
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    WHERE
        sms.vol_frac=1 and smp.wt_frac=1;
'''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0
7,23,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5
8,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0
9,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0


5. Now to this table add the device substrate information 

In [7]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        df.params
        
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    JOIN
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    JOIN
        OFET_PROCESS op on s.process_id = op.process_id
    JOIN
        DEVICE_FABRICATION df on op.device_fab_id = df.device_fab_id
    WHERE
        sms.vol_frac=1 and smp.wt_frac=1;
    '''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,params
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,"{'channel_width': 1500.0, 'gate_material': 'n-..."
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,"{'channel_width': 2000.0, 'gate_material': 'n-..."
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,"{'channel_width': 4000.0, 'gate_material': 'n-..."
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,"{'channel_width': 4000.0, 'gate_material': 'n-..."
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,"{'channel_width': 1400.0, 'gate_material': 'n-..."


6. Now to this table add the film deposition type only not the parameters associated with it

In [8]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        df.params,
        fd.deposition_type
        
    FROM
        SAMPLE as s
    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    JOIN
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    JOIN
        OFET_PROCESS op on s.process_id = op.process_id
    JOIN
        DEVICE_FABRICATION df on op.device_fab_id = df.device_fab_id
    JOIN
        FILM_DEPOSITION fd on fd.film_deposition_id = op.film_deposition_id
    WHERE
        sms.vol_frac=1 and smp.wt_frac=1;
    '''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,params,deposition_type
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,"{'channel_width': 1400.0, 'gate_material': 'n-...",spin


7. Now to this table add the solution treatment, substrate preatreat and post process informaiton 

1 if treatment is done and 0 if no treatment



In [9]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        df.params,
        fd.deposition_type,
    CASE
        When sts.treatment_type is not NULL then '1'
        ELSE '0'
    END as solution_treatment,
    CASE
        When sps.treatment_type is not NULL then '1'
        ELSE '1'
    END as substrate_pretreatment
    
    FROM
        SAMPLE as s

    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    JOIN
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    JOIN
        OFET_PROCESS op on s.process_id = op.process_id
    JOIN
        DEVICE_FABRICATION df on op.device_fab_id = df.device_fab_id
    JOIN
        FILM_DEPOSITION fd on fd.film_deposition_id = op.film_deposition_id
    LEFT JOIN
        SOLUTION_TREATMENT st on st.solution_treatment_id = op.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto on sto.solution_treatment_id =
        st.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP sts on sts.solution_treatment_step_id = 
        sto.solution_treatment_step_id
    LEFT JOIN
        SUBSTRATE_PRETREAT sp on sp.substrate_pretreat_id = 
        op.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_ORDER spo on spo.substrate_pretreat_id =
        sp.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP sps on sps.substrate_pretreat_step_id = 
        spo.substrate_pretreat_step_id
    WHERE
        sms.vol_frac=1 and smp.wt_frac=1;
    
    '''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,params,deposition_type,solution_treatment,substrate_pretreatment
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,"{'channel_width': 1400.0, 'gate_material': 'n-...",spin,0,1


8. Now to this table add the hole mobility information 

only keep devices that have an actual hole mobility value and is not Null or NAN

In [16]:
query = '''
    SELECT
        s.sample_id,
        e.citation_type,
        e.meta as experiment_meta,
        sol.concentration,
        df.params,
        fd.deposition_type,
    CASE
        When sts.treatment_type is not NULL then '1'
        ELSE '0'
    END as solution_treatment,
    CASE
        When sps.treatment_type is not NULL then '1'
        ELSE '1'
    END as substrate_pretreatment,
    CAST (m.data -> 'hole mobility' ->> 'value' as float) as hole_mobility
    
    FROM
        SAMPLE as s

    JOIN
        EXPERIMENT_INFO as e ON s.exp_id = e.exp_id
    JOIN
        SOLUTION_MAKEUP_SOLVENT sms ON s.process_id = sms.solution_id
    JOIN
        SOLUTION sol ON sms.solution_id = sol.solution_id
    JOIN
        SOLUTION_MAKEUP_POLYMER smp ON s.process_id = smp.solution_id
    JOIN
        OFET_PROCESS op on s.process_id = op.process_id
    JOIN
        DEVICE_FABRICATION df on op.device_fab_id = df.device_fab_id
    JOIN
        FILM_DEPOSITION fd on fd.film_deposition_id = op.film_deposition_id
    LEFT JOIN
        SOLUTION_TREATMENT st on st.solution_treatment_id = op.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_ORDER sto on sto.solution_treatment_id =
        st.solution_treatment_id
    LEFT JOIN
        SOLUTION_TREATMENT_STEP sts on sts.solution_treatment_step_id = 
        sto.solution_treatment_step_id
    LEFT JOIN
        SUBSTRATE_PRETREAT sp on sp.substrate_pretreat_id = 
        op.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_ORDER spo on spo.substrate_pretreat_id =
        sp.substrate_pretreat_id
    LEFT JOIN
        SUBSTRATE_PRETREAT_STEP sps on sps.substrate_pretreat_step_id = 
        spo.substrate_pretreat_step_id
    LEFT JOIN
        MEASUREMENT m on s.sample_id = m.sample_id
    WHERE
        sms.vol_frac=1 and smp.wt_frac=1;
    
    '''

result_df = read_select_query(query)

result_df



Unnamed: 0,sample_id,citation_type,experiment_meta,concentration,params,deposition_type,solution_treatment,substrate_pretreatment,hole_mobility
0,1,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1,
1,2,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",4.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1,
2,3,literature,"{'doi': '10.1039/C5TC02579F', 'publication_typ...",5.0,"{'channel_width': 1500.0, 'gate_material': 'n-...",spin,0,1,
3,12,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1,
4,13,literature,"{'doi': '10.1021/acs.chemmater.7b03019', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1,
5,14,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",5.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1,
6,15,literature,"{'doi': '10.1021/acs.chemmater.8b05224', 'publ...",4.0,"{'channel_width': 2000.0, 'gate_material': 'n-...",spin,0,1,
7,24,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",3.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1,
8,25,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",7.0,"{'channel_width': 4000.0, 'gate_material': 'n-...",spin,0,1,
9,26,literature,"{'doi': '10.1038/srep00754 ', 'publication_typ...",6.5,"{'channel_width': 1400.0, 'gate_material': 'n-...",spin,0,1,


This is a simple version of the dataframe result_df. There are a couple more things we can now do to the dataframe to make it more machine readable. We no longer need to use sql queries. can just work with the dataframe and pandas

9. The code block below will unpack the data stored in device_substrate_parameters and store it as columns (this one is done for you)

In [11]:
 #unpacking the information stored in device_substrate_parameters

import pandas as pd
from pandas import json_normalize



# # Extract the 'device_substrate_parameters' column and normalize it
params_df = json_normalize(result_df['device_substrate_parameters'])

# # Concatenate the original DataFrame with the new 'params_df'
result_df = pd.concat([result_df, params_df], axis=1)

# # Drop the original 'device_substrate_parameters' column
result_df = result_df.drop('device_substrate_parameters', axis=1)

# # Display the resulting DataFrame
result_df

KeyError: 'device_substrate_parameters'

In [None]:

# Drop unnecessary columns that won't be used for modeling
columns_to_drop = ['film_deposition_params', 'citation_type', 'experiment_meta', 'solvent_vol_frac', 'solvent_iupac_name', 'polymer_iupac_name', 'dielectric_material_2', 'dielectric_thickness_2', 'dielectric_1_material', 'dielectric_1_thickness', 'substrate_material']
result_df = result_df.drop(columns=columns_to_drop)

# Display the resulting DataFrame
result_df


We are now going to do something called one hot encoding to convert columns with textual information into numbers (1 and 0)

In [None]:
# List of columns to one-hot encode
columns_to_one_hot_encode = ['film_deposition_type', 'gate_material', 'dielectric_material', 'electrode_configuration']

# Perform one-hot encoding
result_df = pd.get_dummies(result_df, columns=columns_to_one_hot_encode)

# Display the resulting DataFrame with one-hot encoding
result_df


Now lets see all the columns in this database 

In [None]:
column_names = result_df.columns
print(column_names)


### Consolidating Descriptors

We are going to consolidate some of the descriptors into one column. 


Coating :

* film_deposition_type_MGC (dip,Dip,blade, inkjet, shear, wire) - value of 1 if any of these columns are true or else 0
* film_deposition_type_spin
* film_deposition_type_drop

Gate Material :

* gate_material_n_doped Si = ('gate_material_n-doped Si', 'gate_material_Si','gate_material_p-doped Si') 

* gate_material_other = ('gate_material_Al', 'gate_material_Au', 'gate_material_PEDOT:PSS', 'gate_material_PET','gate_material_glass')


Dielectric Material :

* dielectric_material_SiO2
* dielectric_material_other = (
        'dielectric_material_6FDA-DABC',
       'dielectric_material_CYTOP', 'dielectric_material_PAN',
       'dielectric_material_PMMA', 'dielectric_material_PTrFE',
       'dielectric_material_PVP', 'dielectric_material_Shellac',
       'dielectric_material_Si3N4')




In [None]:
result_df

In [None]:
## replacing the pubchem_cid with solvent boiling point

# Get unique PubChem CIDs from the 'solvent_pubchem_cid' column
unique_pubchem_cids = result_df['solvent_pubchem_cid'].unique()

# Display the unique PubChem CIDs
print(unique_pubchem_cids)



In [None]:
# Dictionary mapping PubChem CIDs to boiling points
boiling_point_dict = {
    7964: 132,
    6212: 62,
    7239: 180.1,
    6591: 146,
    7809: 138,
    13229: 238,
    13: 213,
    8030: 84,
    1140: 111,
    7501: 145,
    241: 80,
    6344: 40,
    7503: 179
}

# Add a new column "solvent_boiling_point" based on PubChem CIDs
result_df['solvent_boiling_point'] = result_df['solvent_pubchem_cid'].map(boiling_point_dict)


# Drop unnecessary columns that won't be used for modeling
columns_to_drop = ['solvent_pubchem_cid']
result_df = result_df.drop(columns=columns_to_drop)

# Display the resulting DataFrame
result_df


In [None]:
column_names = result_df.columns
print(column_names)

In [None]:
result_df_P3HT = result_df[result_df['polymer_common_name'] == 'P3HT']
result_df_DPP_DTT = result_df[result_df['polymer_common_name'] != 'P3HT']


### P3HT Modeling

In [None]:
result_df_P3HT = result_df_P3HT.drop(columns='polymer_common_name')
result_df_P3HT

In [None]:
num_rows, num_columns = result_df_P3HT.shape
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_columns}")


### DPP-DTT Modeling

In [None]:
result_df_DPP_DTT = result_df_DPP_DTT.drop(columns='polymer_common_name')
result_df_DPP_DTT

In [None]:
num_rows, num_columns = result_df_DPP_DTT.shape
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_columns}")
