<a href="https://colab.research.google.com/github/cskipper07/Data-Science/blob/main/1_Intraobserver_error_cnm_copy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Intraobserver error**
This file compares two rounds of analysis (i.e., round1 and round2) collected by a single observer to assess intraobserver error. Some outputs have been removed to protect PII.

### Import libraries

In [None]:
pip install scipy pandas seaborn pingouin

Collecting pingouin
  Downloading pingouin-0.5.0.tar.gz (182 kB)
[K     |████████████████████████████████| 182 kB 22.4 MB/s 
Collecting scipy
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
[K     |████████████████████████████████| 38.1 MB 346 kB/s 
[?25hCollecting statsmodels>=0.12.0
  Downloading statsmodels-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
[K     |████████████████████████████████| 9.8 MB 68.6 MB/s 
Collecting pandas_flavor>=0.2.0
  Downloading pandas_flavor-0.2.0-py2.py3-none-any.whl (6.6 kB)
Collecting outdated
  Downloading outdated-0.2.1-py3-none-any.whl (7.5 kB)
Collecting littleutils
  Downloading littleutils-0.2.2.tar.gz (6.6 kB)
Building wheels for collected packages: pingouin, littleutils
  Building wheel for pingouin (setup.py) ... [?25l[?25hdone
  Created wheel for pingouin: filename=pingouin-0.5.0-py3-none-any.whl size=193660 sha256=886cb0ec104559f6b1af63d0516e58a80fa2d41b40da1fc1

In [None]:
import pandas as pd
import scipy
import numpy as np
import seaborn as sns
# import pingouin as pt
import os

In [None]:
!pip install --upgrade openpyxl



### Set print options

In [None]:
import sys

In [None]:
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [None]:
np.set_printoptions(threshold=sys.maxsize)

### Set export

In [None]:
from google.colab import  drive
drive.mount('/drive')

Mounted at /drive


# --- Cranial nonmetrics and macromorphoscopics ---

### Import data

### **Input data files:**
*   *cnm_intraobserver_round1.xlsx* and *cnm_intraobserver_round2.xlsx*
*   These files were created from the original Microsoft Access database and include all variables for the two rounds of data collection intended for intraobserver error analysis.
*   Something was wrong with the actual scores in the round1 and round2 files. So I had to add a blank row to each df and fill the columns with all 9s (in Excel). In Python (this document) I had to replace the 9s with np.nan so that all the scores would now have one decimal place (so they would be compatible in the Cohen's kappa test). Then I had to delete the blank row (index 7) to get rid of that random dummy row.




In [None]:
round1 = pd.read_excel('cnm_intraobserver_round1.xlsx')
round1.head(10)

In [None]:
round2 = pd.read_excel('cnm_intraobserver_round2.xlsx')
round2.head(10)

Replace any value of 9 with NaN.

In [None]:
round1.replace(9, np.nan, inplace=True)

In [None]:
round1.head(10)

In [None]:
round2.replace(9, np.nan, inplace=True)

In [None]:
round2.head(10)

Drop row 7 as it was showing blank and/or all NaN.

In [None]:
round1 = round1.drop(7)

In [None]:
round1.head(10)

In [None]:
round2 = round2.drop(7)

In [None]:
round2.head(10)

#### Explore data

In [None]:
round1_colnames = list(round1.columns)
for c in round1_colnames:
    print(c)

SkelID
Collection
Supraorbital notch R
Supraorbital notch L
Supraorbital foramen R
Supraorbital foramen L
Infraorbital suture R
Infraorbital suture L
Multiple infraorbital foramina R
Multiple infraorbital foramina L
Zygomatico-facial foramina R
Zygomatico-facial foramina L
Condylar canal R
Condylar canal L
Flexure of sup sagittal sulcus
Foramen ovale incomplete R
Foramen ovale incomplete L
Foramen spinosum incomplete R
Foramen spinosum incomplete L
Pterygo-spinous bridge R
Pterygo-spinous bridge L
Pterygo-alar bridge R
Pterygo-alar bridge L
Tympanic dihiscence R
Tympanic dihiscence L
Auditory exostosis R
Auditory exostosis L
Anterior nasal spine
Inferior nasal aperture
Interorbital breadth
Malar tubercle R
Malar tubercle L
Nasal aperture shape
Nasal aperture width
Nasal bone contour
Nasal bone shape
Nasal overgrowth R
Nasal overgrowth L
Nasofrontal suture
Orbital shape
Postbregmatic depression
Posterior zygomatic tubercle R
Posterior zygomatic tubercle L
Supranasal suture
Transverse pa

##### Drops columns

In [None]:
# drop 'Collection', 'SkelID', and 'Lamb sut oss" columns in round1

round1 = round1.drop(columns=['Collection'])
round1 = round1.drop(columns=['SkelID'])
round1 = round1.drop(columns=['Lamb sut oss'])
round1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 88 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Supraorbital notch R              7 non-null      float64
 1   Supraorbital notch L              7 non-null      float64
 2   Supraorbital foramen R            7 non-null      float64
 3   Supraorbital foramen L            7 non-null      float64
 4   Infraorbital suture R             7 non-null      float64
 5   Infraorbital suture L             7 non-null      float64
 6   Multiple infraorbital foramina R  7 non-null      float64
 7   Multiple infraorbital foramina L  7 non-null      float64
 8   Zygomatico-facial foramina R      7 non-null      float64
 9   Zygomatico-facial foramina L      7 non-null      float64
 10  Condylar canal R                  7 non-null      float64
 11  Condylar canal L                  7 non-null      float64
 12  Flexure of s

In [None]:
# drop 'Collection', 'SkelID', and 'Lamb sut oss" columns in round2

round2 = round2.drop(columns=['Collection'])
round2 = round2.drop(columns=['SkelID'])
round2 = round2.drop(columns=['Lamb sut oss'])
round2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 88 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Supraorbital notch R              7 non-null      float64
 1   Supraorbital notch L              7 non-null      float64
 2   Supraorbital foramen R            7 non-null      float64
 3   Supraorbital foramen L            7 non-null      float64
 4   Infraorbital suture R             7 non-null      float64
 5   Infraorbital suture L             7 non-null      float64
 6   Multiple infraorbital foramina R  7 non-null      float64
 7   Multiple infraorbital foramina L  7 non-null      float64
 8   Zygomatico-facial foramina R      7 non-null      float64
 9   Zygomatico-facial foramina L      7 non-null      float64
 10  Condylar canal R                  7 non-null      float64
 11  Condylar canal L                  7 non-null      float64
 12  Flexure of s

##### Collapse columns


*   Lambdoid ossicle medial and lateral traits were collapsed to a single LBSO (lambdoid suture ossicle) column to make these data compatible with reference/comparative data in future analyses.



In [None]:
# collapse lambdoid ossicle medial and lateral traits for round1

LBSO_columns = ['Lamb oss med R', 'Lamb oss med L', 'Lamb oss lat R', 'Lamb oss lat L']
round1['LBSO'] = (round1[LBSO_columns] == 1).any(axis='columns')

In [None]:
round1['LBSO'].head(10)

0    False
1     True
2     True
3    False
4     True
5     True
6    False
Name: LBSO, dtype: bool

In [None]:
round1['LBSO'].replace([True, False], [1, 0], inplace=True)
round1['LBSO'].head(15)

0    0
1    1
2    1
3    0
4    1
5    1
6    0
Name: LBSO, dtype: int64

In [None]:
round1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 89 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Supraorbital notch R              7 non-null      float64
 1   Supraorbital notch L              7 non-null      float64
 2   Supraorbital foramen R            7 non-null      float64
 3   Supraorbital foramen L            7 non-null      float64
 4   Infraorbital suture R             7 non-null      float64
 5   Infraorbital suture L             7 non-null      float64
 6   Multiple infraorbital foramina R  7 non-null      float64
 7   Multiple infraorbital foramina L  7 non-null      float64
 8   Zygomatico-facial foramina R      7 non-null      float64
 9   Zygomatico-facial foramina L      7 non-null      float64
 10  Condylar canal R                  7 non-null      float64
 11  Condylar canal L                  7 non-null      float64
 12  Flexure of s

In [None]:
# collapse lambdoid ossicle medial and lateral traits for round2

LBSO_columns2 = ['Lamb oss med R', 'Lamb oss med L', 'Lamb oss lat R', 'Lamb oss lat L']
round2['LBSO'] = (round2[LBSO_columns2] == 1).any(axis='columns')

In [None]:
round2['LBSO'].head(10)

0    False
1     True
2     True
3     True
4     True
5     True
6     True
Name: LBSO, dtype: bool

In [None]:
round2['LBSO'].replace([True, False], [1, 0], inplace=True)
round2['LBSO'].head(15)

0    0
1    1
2    1
3    1
4    1
5    1
6    1
Name: LBSO, dtype: int64

In [None]:
round2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 89 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Supraorbital notch R              7 non-null      float64
 1   Supraorbital notch L              7 non-null      float64
 2   Supraorbital foramen R            7 non-null      float64
 3   Supraorbital foramen L            7 non-null      float64
 4   Infraorbital suture R             7 non-null      float64
 5   Infraorbital suture L             7 non-null      float64
 6   Multiple infraorbital foramina R  7 non-null      float64
 7   Multiple infraorbital foramina L  7 non-null      float64
 8   Zygomatico-facial foramina R      7 non-null      float64
 9   Zygomatico-facial foramina L      7 non-null      float64
 10  Condylar canal R                  7 non-null      float64
 11  Condylar canal L                  7 non-null      float64
 12  Flexure of s

### Rename and strip the column names

#### Round 1

In [None]:
round1 = round1.rename(columns={'Metopism': 'METO',
                                  'Inca': 'INCA',
                                  'Occipito-mastoid suture oss R': 'OMBR',
                                  'Occipito-mastoid suture oss L': 'OMBL',
                                  'Asterionic oss R': 'ASTR',
                                  'Asterionic oss L': 'ASTL',
                                  'Parietal notch oss R': 'PNBR',
                                  'Parietal notch oss L': 'PNBL',
                                  'Pharyngeal fossa': 'PHAR',
                                  'Divided hypoglossal canal R': 'HYPR',
                                  'Divided hypoglossal canal L': 'HYPL',
                                  'Tympanic dihiscence R': 'TYMR',
                                  'Tympanic dihiscence L': 'TYML',
                                  'Pterygo-spinous bridge R': 'CIVR',
                                  'Pterygo-spinous bridge L': 'CIVL',
                                  'Pterygo-alar bridge R': 'PTBR',
                                  'Pterygo-alar bridge L': 'PTBL',
                                  'Supraorbital foramen R': 'SOFR',
                                  'Supraorbital foramen L': 'SOFL',
                                  'Supraorbital notch R': 'SONR',
                                  'Supraorbital notch L': 'SONL',
                                  'Acc mental foramen R': 'MENR',
                                  'Acc mental foramen L': 'MENL',
                                  'Mylohyoid bridge R': 'MHBR',
                                  'Mylohyoid bridge L': 'MHBL',
                                  'Parietal foramen R': 'PFR',
                                  'Parietal foramen L': 'PFL',
                                  'Mastoid foramen number R': 'MFR',
                                  'Mastoid foramen number L': 'MFL',
                                  'Coronal oss R': 'CRBR',
                                  'Coronal oss L': 'CRBL',
                                  'Epipteric oss R': 'EPBR',
                                  'Epipteric oss L': 'EPBL',
                                  'Fronto-temp articulation R': 'FTAR',
                                  'Fronto-temp articulation L': 'FTAL',
                                  'Acc lesser palatine foramen R': 'APFR',
                                  'Acc lesser palatine foramen L': 'APFL',
                                  'Infraorbital suture R': 'IFSR',
                                  'Infraorbital suture L': 'IFSL',
                                  'Multiple infraorbital foramina R': 'MIFR',
                                  'Multiple infraorbital foramina L': 'MIFL',
                                  'Sagittal oss': 'SAGB',
                                  'Bregma oss': 'BREG',
                                  'Palatine torus': 'PALT',
                                  'Mandibular torus': 'MANT',
                                  'Flexure of sup sagittal sulcus': 'SSSF',
                                  'Zygomatico-facial foramina R': 'ZFFR',
                                  'Zygomatico-facial foramina L': 'ZFFL',
                                  'Condylar canal R': 'CCOR',
                                  'Condylar canal L': 'CCOL',
                                  'Foramen ovale incomplete R': 'FOIR',
                                  'Foramen ovale incomplete L': 'FOIL',
                                  'Foramen spinosum incomplete R': 'FSIR',
                                  'Foramen spinosum incomplete L': 'FSIL',
                                  'Auditory exostosis R': 'AUDR',
                                  'Auditory exostosis L': 'AUDL',
                                  'Anterior nasal spine': 'ANS',
                                  'Inferior nasal aperture': 'INA',
                                  'Interorbital breadth': 'IOB',
                                  'Malar tubercle R': 'MTR',
                                  'Malar tubercle L': 'MTL',
                                  'Nasal aperture shape': 'NAS',
                                  'Nasal aperture width': 'NAW',
                                  'Nasal bone contour': 'NBC',
                                  'Nasal bone shape': 'NBS',
                                  'Nasal overgrowth R': 'NOR',
                                  'Nasal overgrowth L': 'NOL',
                                  'Nasofrontal suture': 'NFS',
                                  'Orbital shape': 'OBS',
                                  'Postbregmatic depression': 'PBD',
                                  'Posterior zygomatic tubercle R': 'PZTR',
                                  'Posterior zygomatic tubercle L': 'PZTL',
                                  'Supranasal suture': 'SPS',
                                  'Transverse palatine suture': 'TPS',
                                  'Palate shape': 'PS',
                                  'Zygomaticomaxillary suture R': 'ZSR',
                                  'Zygomaticomaxillary suture L': 'ZSL',
                                  'Mastoid foramen location R': 'MFLoR',
                                  'Mastoid foramen location L': 'MFLoL',
                                  'Lamb oss med R': 'LBMR',
                                  'Lamb oss med L': 'LBML',
                                  'Lamb oss lat R': 'LBLaR',
                                  'Lamb oss lat L': 'LBLaL',
                                  'Apical bone': 'APIC',
                                  'Frontal foramina R': 'FFR',
                                  'Frontal foramina L': 'FFL',
                                  'Male vermiculate pattern': 'MVP',
                                  'Twigs': 'TWIG'})
round1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 89 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   SONR    7 non-null      float64
 1   SONL    7 non-null      float64
 2   SOFR    7 non-null      float64
 3   SOFL    7 non-null      float64
 4   IFSR    7 non-null      float64
 5   IFSL    7 non-null      float64
 6   MIFR    7 non-null      float64
 7   MIFL    7 non-null      float64
 8   ZFFR    7 non-null      float64
 9   ZFFL    7 non-null      float64
 10  CCOR    7 non-null      float64
 11  CCOL    7 non-null      float64
 12  SSSF    7 non-null      float64
 13  FOIR    7 non-null      float64
 14  FOIL    7 non-null      float64
 15  FSIR    7 non-null      float64
 16  FSIL    7 non-null      float64
 17  CIVR    7 non-null      float64
 18  CIVL    7 non-null      float64
 19  PTBR    7 non-null      float64
 20  PTBL    7 non-null      float64
 21  TYMR    7 non-null      float64
 22  TYML  

##### Stripping the variable names
Something was wrong with the variable names so I had to strip them


In [None]:
# remove problem with variable names

for c in round1.columns:
  print(c.strip())
  round1
  round1.rename(columns={c:c.strip()}, inplace=True)

SONR
SONL
SOFR
SOFL
IFSR
IFSL
MIFR
MIFL
ZFFR
ZFFL
CCOR
CCOL
SSSF
FOIR
FOIL
FSIR
FSIL
CIVR
CIVL
PTBR
PTBL
TYMR
TYML
AUDR
AUDL
ANS
INA
IOB
MTR
MTL
NAS
NAW
NBC
NBS
NOR
NOL
NFS
OBS
PBD
PZTR
PZTL
SPS
TPS
ZSR
ZSL
LBMR
LBML
LBLaR
LBLaL
PFR
PFL
MFR
MFL
MFLoR
MFLoL
CRBR
CRBL
EPBR
EPBL
FTAR
FTAL
PNBR
PNBL
ASTR
ASTL
OMBR
OMBL
HYPR
HYPL
APFR
APFL
FFR
FFL
MHBR
MHBL
MENR
MENL
INCA
SAGB
BREG
PALT
PS
APIC
METO
MVP
MANT
PHAR
TWIG
LBSO


#### Round 2

In [None]:
round2 = round2.rename(columns={'Metopism': 'METO',
                                  'Inca': 'INCA',
                                  'Occipito-mastoid suture oss R': 'OMBR',
                                  'Occipito-mastoid suture oss L': 'OMBL',
                                  'Asterionic oss R': 'ASTR',
                                  'Asterionic oss L': 'ASTL',
                                  'Parietal notch oss R': 'PNBR',
                                  'Parietal notch oss L': 'PNBL',
                                  'Pharyngeal fossa': 'PHAR',
                                  'Divided hypoglossal canal R': 'HYPR',
                                  'Divided hypoglossal canal L': 'HYPL',
                                  'Tympanic dihiscence R': 'TYMR',
                                  'Tympanic dihiscence L': 'TYML',
                                  'Pterygo-spinous bridge R': 'CIVR',
                                  'Pterygo-spinous bridge L': 'CIVL',
                                  'Pterygo-alar bridge R': 'PTBR',
                                  'Pterygo-alar bridge L': 'PTBL',
                                  'Supraorbital foramen R': 'SOFR',
                                  'Supraorbital foramen L': 'SOFL',
                                  'Supraorbital notch R': 'SONR',
                                  'Supraorbital notch L': 'SONL',
                                  'Acc mental foramen R': 'MENR',
                                  'Acc mental foramen L': 'MENL',
                                  'Mylohyoid bridge R': 'MHBR',
                                  'Mylohyoid bridge L': 'MHBL',
                                  'Parietal foramen R': 'PFR',
                                  'Parietal foramen L': 'PFL',
                                  'Mastoid foramen number R': 'MFR',
                                  'Mastoid foramen number L': 'MFL',
                                  'Coronal oss R': 'CRBR',
                                  'Coronal oss L': 'CRBL',
                                  'Epipteric oss R': 'EPBR',
                                  'Epipteric oss L': 'EPBL',
                                  'Fronto-temp articulation R': 'FTAR',
                                  'Fronto-temp articulation L': 'FTAL',
                                  'Acc lesser palatine foramen R': 'APFR',
                                  'Acc lesser palatine foramen L': 'APFL',
                                  'Infraorbital suture R': 'IFSR',
                                  'Infraorbital suture L': 'IFSL',
                                  'Multiple infraorbital foramina R': 'MIFR',
                                  'Multiple infraorbital foramina L': 'MIFL',
                                  'Sagittal oss': 'SAGB',
                                  'Bregma oss': 'BREG',
                                  'Palatine torus': 'PALT',
                                  'Mandibular torus': 'MANT',
                                  'Flexure of sup sagittal sulcus': 'SSSF',
                                  'Zygomatico-facial foramina R': 'ZFFR',
                                  'Zygomatico-facial foramina L': 'ZFFL',
                                  'Condylar canal R': 'CCOR',
                                  'Condylar canal L': 'CCOL',
                                  'Foramen ovale incomplete R': 'FOIR',
                                  'Foramen ovale incomplete L': 'FOIL',
                                  'Foramen spinosum incomplete R': 'FSIR',
                                  'Foramen spinosum incomplete L': 'FSIL',
                                  'Auditory exostosis R': 'AUDR',
                                  'Auditory exostosis L': 'AUDL',
                                  'Anterior nasal spine': 'ANS',
                                  'Inferior nasal aperture': 'INA',
                                  'Interorbital breadth': 'IOB',
                                  'Malar tubercle R': 'MTR',
                                  'Malar tubercle L': 'MTL',
                                  'Nasal aperture shape': 'NAS',
                                  'Nasal aperture width': 'NAW',
                                  'Nasal bone contour': 'NBC',
                                  'Nasal bone shape': 'NBS',
                                  'Nasal overgrowth R': 'NOR',
                                  'Nasal overgrowth L': 'NOL',
                                  'Nasofrontal suture': 'NFS',
                                  'Orbital shape': 'OBS',
                                  'Postbregmatic depression': 'PBD',
                                  'Posterior zygomatic tubercle R': 'PZTR',
                                  'Posterior zygomatic tubercle L': 'PZTL',
                                  'Supranasal suture': 'SPS',
                                  'Transverse palatine suture': 'TPS',
                                  'Palate shape': 'PS',
                                  'Zygomaticomaxillary suture R': 'ZSR',
                                  'Zygomaticomaxillary suture L': 'ZSL',
                                  'Mastoid foramen location R': 'MFLoR',
                                  'Mastoid foramen location L': 'MFLoL',
                                  'Lamb oss med R': 'LBMR',
                                  'Lamb oss med L': 'LBML',
                                  'Lamb oss lat R': 'LBLaR',
                                  'Lamb oss lat L': 'LBLaL',
                                  'Apical bone': 'APIC',
                                  'Frontal foramina R': 'FFR',
                                  'Frontal foramina L': 'FFL',
                                  'Male vermiculate pattern': 'MVP',
                                  'Twigs': 'TWIG'})
round2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 89 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   SONR    7 non-null      float64
 1   SONL    7 non-null      float64
 2   SOFR    7 non-null      float64
 3   SOFL    7 non-null      float64
 4   IFSR    7 non-null      float64
 5   IFSL    7 non-null      float64
 6   MIFR    7 non-null      float64
 7   MIFL    7 non-null      float64
 8   ZFFR    7 non-null      float64
 9   ZFFL    7 non-null      float64
 10  CCOR    7 non-null      float64
 11  CCOL    7 non-null      float64
 12  SSSF    7 non-null      float64
 13  FOIR    7 non-null      float64
 14  FOIL    7 non-null      float64
 15  FSIR    7 non-null      float64
 16  FSIL    7 non-null      float64
 17  CIVR    7 non-null      float64
 18  CIVL    7 non-null      float64
 19  PTBR    7 non-null      float64
 20  PTBL    7 non-null      float64
 21  TYMR    7 non-null      float64
 22  TYML  

##### Stripping the variable names
* Something was wrong with the variable names so I had to strip them to remove any potential leading or trailing whitespaces.


In [None]:
# remove problem with variable names

for c in round2.columns:
  print(c.strip())
  round2
  round2.rename(columns={c:c.strip()}, inplace=True)

SONR
SONL
SOFR
SOFL
IFSR
IFSL
MIFR
MIFL
ZFFR
ZFFL
CCOR
CCOL
SSSF
FOIR
FOIL
FSIR
FSIL
CIVR
CIVL
PTBR
PTBL
TYMR
TYML
AUDR
AUDL
ANS
INA
IOB
MTR
MTL
NAS
NAW
NBC
NBS
NOR
NOL
NFS
OBS
PBD
PZTR
PZTL
SPS
TPS
ZSR
ZSL
LBMR
LBML
LBLaR
LBLaL
PFR
PFL
MFR
MFL
MFLoR
MFLoL
CRBR
CRBL
EPBR
EPBL
FTAR
FTAL
PNBR
PNBL
ASTR
ASTL
OMBR
OMBL
HYPR
HYPL
APFR
APFL
FFR
FFL
MHBR
MHBL
MENR
MENL
INCA
SAGB
BREG
PALT
PS
APIC
METO
MVP
MANT
PHAR
TWIG
LBSO


In [None]:
round2.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,PZTL,SPS,TPS,ZSR,ZSL,LBMR,LBML,LBLaR,LBLaL,PFR,PFL,MFR,MFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,1.0,2.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,1.0,1.0,2.0,1.0,1.0,2.0,,,,,1.0,0.0,2.0,1.0,4.0,1.0,,,,,,,,,,,,,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,1.0,2.0,,0.0,0.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,0.0,0.0,0.0,1.0,0.0,2.0,3.0,4.0,1.0,1.0,1.0,,,1.0,1.0,1.0,0.0,2.0,2.0,4.0,1.0,,,,,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,,0.0,0.0,0.0,0.0,1.0,1
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,3.0,3.0,0.0,0.0,2.0,1.0,0.0,3.0,3.0,2.0,3.0,2.0,2.0,,,,1.0,1.0,0.0,1.0,1.0,2.0,1.0,,,,,,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,,0.0,0.0,1.0,0.0,,1
4,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,1.0,2.0,,,,,4.0,1.0,0.0,2.0,3.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,,1.0,0.0,1.0,1.0,0.0,0.0,1
5,1.0,1.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,,5.0,2.0,2.0,2.0,,,0.0,3.0,1.0,1.0,2.0,3.0,1.0,1.0,1.0,0.0,,2.0,2.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,4.0,5.0,1.0,1.0,1.0,,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,,1.0,0.0,1.0,1.0,0.0,1.0,1
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,,,1.0,1.0,2.0,1.0,4.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,0.0,0.0,1.0,1.0,0.0,1.0,1


#### Replace blank cells with NaN

In [None]:
#round1 = round1.replace(r'^\s*$', np.nan, regex=True)

In [None]:
round1.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,PZTL,SPS,TPS,ZSR,ZSL,LBMR,LBML,LBLaR,LBLaL,PFR,PFL,MFR,MFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,1.0,3.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,1.0,1.0,2.0,2.0,1.0,2.0,,,,,1.0,0.0,2.0,1.0,4.0,1.0,,,,,,,,,,,,,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,1.0,2.0,,0.0,0.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,0.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,3.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,1.0,1.0,0.0,0.0,1
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,0.0,0.0,4.0,1.0,0.0,3.0,3.0,2.0,1.0,,1.0,0.0,1.0,0.0,1.0,1.0,0.0,2.0,2.0,4.0,4.0,,,0.0,0.0,0.0,0.0,0.0,,1.0,0.0,0.0,0.0,4.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,1
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,3.0,2.0,0.0,0.0,2.0,3.0,0.0,2.0,3.0,0.0,,2.0,2.0,,,,,1.0,1.0,1.0,1.0,2.0,1.0,,,,,,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,,0.0,0.0,1.0,0.0,,0
4,1.0,1.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,3.0,1.0,2.0,,,,,4.0,1.0,0.0,3.0,3.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,0.0,0.0,,0.0,0.0,1.0,,1.0,0.0,1.0,1.0,0.0,0.0,1
5,4.0,2.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,,5.0,2.0,2.0,2.0,,,0.0,3.0,1.0,,2.0,3.0,0.0,1.0,2.0,0.0,,2.0,2.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,1.0,1.0,1.0,,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,,1.0,0.0,1.0,,0.0,,1
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,1.0,1.0,1.0,2.0,2.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,,0.0,,,1.0,1.0,2.0,1.0,4.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,0.0,0.0,1.0,1.0,0.0,,0


In [None]:
#round2 = round2.replace(r'^\s*$', np.nan, regex=True)

In [None]:
round2.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,PZTL,SPS,TPS,ZSR,ZSL,LBMR,LBML,LBLaR,LBLaL,PFR,PFL,MFR,MFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,1.0,2.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,1.0,1.0,2.0,1.0,1.0,2.0,,,,,1.0,0.0,2.0,1.0,4.0,1.0,,,,,,,,,,,,,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,1.0,2.0,,0.0,0.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,0.0,0.0,0.0,1.0,0.0,2.0,3.0,4.0,1.0,1.0,1.0,,,1.0,1.0,1.0,0.0,2.0,2.0,4.0,1.0,,,,,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,,0.0,0.0,0.0,0.0,1.0,1
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,3.0,3.0,0.0,0.0,2.0,1.0,0.0,3.0,3.0,2.0,3.0,2.0,2.0,,,,1.0,1.0,0.0,1.0,1.0,2.0,1.0,,,,,,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,,0.0,0.0,1.0,0.0,,1
4,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,1.0,2.0,,,,,4.0,1.0,0.0,2.0,3.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,,1.0,0.0,1.0,1.0,0.0,0.0,1
5,1.0,1.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,,5.0,2.0,2.0,2.0,,,0.0,3.0,1.0,1.0,2.0,3.0,1.0,1.0,1.0,0.0,,2.0,2.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,4.0,5.0,1.0,1.0,1.0,,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,,1.0,0.0,1.0,1.0,0.0,1.0,1
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,,,1.0,1.0,2.0,1.0,4.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,3.0,0.0,0.0,1.0,1.0,0.0,1.0,1


## Remove NAs with For Loop
#### 1) Check dataframe1 for NaNs. For any NaNs in df1, check against corresponding cells in dataframe2. If df2 also has NaN in that cell, set both cells (in df1 and df2) to 0.
#### 2) Find all NaNs in dataframe 2 and substitute with corresponding values from df1. (No need to check df1 for NaNs since Step 1 already removed all NaNs from df1).
* This method is appropriate for an intraobserver analysis. If either round1 or round2 has an NaN value for a specific trait, then the intraobserver error result would not matter. Leaving an NaN to be compared to an actual value would result in a misleading intraobserver error result for that trait.

In [None]:
def remove_nan(round1, round2):
    column1 = list(round1.columns)
    column2 = list(round2.columns)
    for i in range(len(column1)):
        na_indx1 = list(np.where(round1[column1[i]].isna()))
        na_indx2 = list(np.where(round2[column2[i]].isna()))
        print(na_indx1)
        for j in na_indx1[0]:
            if not (np.isnan(round2[column2[i]][j])):# != 'nan' :
                round1[column1[i]][j] = round2[column2[i]][j]
                #print(type(round2[column[i]][j]))
                #input(round1[column[i]][j])
                #print("I am here")
            else:
                round2[column2[i]][j] = 0
                round1[column1[i]][j] = 0
                #input(round1[column][j])
        for j in na_indx2[0]:
                round2[column2[i]][j] = round1[column1[i]][j]
    return round1, round2
        #indx1 = set()
        #print(type(na_indx1))
        #indx1 = set(na_indx1)
        #indx2 = set((na_indx2))
        #print(round1[column[i]][na_indx1[0][1]])
        #print(na_indx1[0])
        #print(round1[column[i]].isnull())
        #print(round1[column[i]], round2[column[i]])

In [None]:
remove_nan(round1, round2)

[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([5])]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([5])]
[array([5])]
[array([4])]
[array([4])]
[array([4])]
[array([4, 5])]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([3, 5])]
[array([2])]
[a

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


(   SONR  SONL  SOFR  SOFL  IFSR  IFSL  MIFR  MIFL  ZFFR  ZFFL  CCOR  CCOL  SSSF  FOIR  FOIL  FSIR  FSIL  CIVR  CIVL  PTBR  PTBL  TYMR  TYML  AUDR  AUDL  ANS  INA  IOB  MTR  MTL  NAS  NAW  NBC  NBS  NOR  NOL  NFS  OBS  PBD  PZTR  PZTL  SPS  TPS  ZSR  ZSL  LBMR  LBML  LBLaR  LBLaL  PFR  PFL  MFR  MFL  MFLoR  MFLoL  CRBR  CRBL  EPBR  EPBL  FTAR  FTAL  PNBR  PNBL  ASTR  ASTL  OMBR  OMBL  HYPR  HYPL  APFR  APFL  FFR  FFL  MHBR  MHBL  MENR  MENL  INCA  SAGB  BREG  PALT   PS  APIC  METO  MVP  MANT  PHAR  TWIG  LBSO
 0   1.0   1.0   0.0   0.0   0.0   0.0   0.0   2.0   2.0   1.0   1.0   0.0   1.0   0.0   0.0   0.0   0.0   1.0   2.0   0.0   1.0   0.0   0.0   0.0   0.0  2.0  4.0  2.0  1.0  1.0  1.0  3.0  0.0  3.0  0.0  0.0  4.0  3.0  0.0   1.0   1.0  2.0  2.0  1.0  2.0   0.0   0.0    0.0    0.0  1.0  0.0  2.0  1.0    4.0    1.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   3.0   0.0   0.0  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   1.0  2.0   0.0   0.0

In [None]:
round1['SAGB']

0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
5    0.0
6    0.0
Name: SAGB, dtype: float64

In [None]:
round2['SAGB']

0    0.0
1    0.0
2    0.0
3    0.0
4    1.0
5    0.0
6    0.0
Name: SAGB, dtype: float64

In [None]:
'''
r1 = pd.DataFrame()
r2 = pd.DataFrame()
'''

In [None]:
#print(round2[column[1]][j])

#r1, r2 = remove_nan(round1, round2)    # prints the location (indices) of all NaNs in the r1 and r2 DFs

[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]
[array([], dtype=int64)]


In [None]:
#r2.head()

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,...,PFR,PFL,MFR,MFL,MFLR,MFLL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1,1,0,0,0,0,0,0,2,1,1,0,1,0,0,0,0,1,2,0,1,0,0,0,0,2.0,4,2,1,1,1.0,2.0,0.0,3.0,0.0,0.0,4,3,0,1,...,1,0,2,1,4,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,3,0,0,0,0,0,0,0,0,0,0.0,0.0,1.0,2.0,0.0,0,0,0,0,0.0,0
1,0,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,2.0,2,2,1,1,1.0,2.0,1.0,3.0,1.0,1.0,2,1,1,1,...,1,1,3,3,1,1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0,0,1,2,0,0,0,2,0,0,0,0.0,0.0,1.0,3.0,0.0,0,0,1,0,0.0,1
2,0,0,1,0,0,0,2,2,2,2,0,1,1,0,0,0,0,1,1,1,0,0,0,0,0,3.0,2,1,1,1,2.0,2.0,4.0,4.0,0.0,0.0,0,1,0,2,...,1,0,2,2,4,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3,4,1,1,0,0,0,0,0,0,0,0.0,0.0,1.0,3.0,0.0,0,0,0,0,1.0,1
3,0,0,2,1,0,0,0,1,2,2,0,0,1,1,1,0,0,1,1,0,0,0,0,0,0,1.0,3,1,2,1,1.0,2.0,3.0,3.0,0.0,0.0,2,1,0,3,...,1,0,1,1,2,1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0,0,0,1,1,0,1,0,0,0,0,0.0,0.0,1.0,3.0,0.0,0,0,1,0,0.0,1
4,1,0,0,1,2,2,2,0,2,2,1,0,1,1,0,0,1,1,1,0,0,0,0,0,0,2.0,2,2,3,2,1.0,2.0,0.0,0.0,0.0,0.0,4,1,0,2,...,0,1,2,1,4,1,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0,0,1,1,0,0,1,1,0,0,0,1.0,0.0,1.0,0.0,1.0,0,1,1,0,0.0,1


In [None]:
round2.head()

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,PZTL,SPS,TPS,ZSR,ZSL,LBMR,LBML,LBLaR,LBLaL,PFR,PFL,MFR,MFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,1.0,2.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,1.0,1.0,2.0,1.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,1.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,0.0,0.0,0.0,1.0,0.0,2.0,3.0,4.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,2.0,2.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,1
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,3.0,3.0,0.0,0.0,2.0,1.0,0.0,3.0,3.0,2.0,3.0,2.0,2.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1
4,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,2.0,3.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1


In [None]:
round1.head()

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAS,NAW,NBC,NBS,NOR,NOL,NFS,OBS,PBD,PZTR,PZTL,SPS,TPS,ZSR,ZSL,LBMR,LBML,LBLaR,LBLaL,PFR,PFL,MFR,MFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,INCA,SAGB,BREG,PALT,PS,APIC,METO,MVP,MANT,PHAR,TWIG,LBSO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,1.0,3.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,1.0,1.0,2.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,1.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,0.0,3.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,3.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,1.0,1.0,0.0,0.0,1
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,4.0,4.0,0.0,0.0,4.0,1.0,0.0,3.0,3.0,2.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,2.0,2.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,4.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,1
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,3.0,2.0,0.0,0.0,2.0,3.0,0.0,2.0,3.0,0.0,3.0,2.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,0
4,1.0,1.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,3.0,1.0,2.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,3.0,3.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1


### Separate weighted/unweighted

#### Round 1

In [None]:
# ordinal columns will be weighted for Cohen's kappa
# create a new df for weighted columns

round1_weighted = round1.filter(['SONR', 'SONL',
                                 'SOFR', 'SOFL',
                                 'IFSR', 'IFSL',
                                 'MIFR', 'MIFL',
                                 'ZFFR', 'ZFFL',
                                 'CIVR', 'CIVL',
                                 'PTBR', 'PTBL',
                                 'TYMR', 'TYML',
                                 'AUDR', 'AUDL',
                                 'ANS', 'INA', 'IOB',
                                 'MTR', 'MTL',
                                 'NAW',
                                 'PZTR', 'PZTL',
                                 'ZSR', 'ZSL',
                                 'MFR', 'MFL',
                                 'HYPR', 'HYPL',
                                 'APFR', 'APFL',
                                 'FFR', 'FFL',
                                 'MHBR', 'MHBL',
                                 'MENR', 'MENL',
                                 'SAGB', 'METO'], axis=1)

round1_weighted.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAW,PZTR,PZTL,ZSR,ZSL,MFR,MFL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,SAGB,METO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,3.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,1.0,2.0,2.0,2.0,3.0,3.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,3.0,3.0,1.0,1.0,2.0,2.0,4.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,3.0,2.0,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,1.0,1.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,2.0,2.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
5,4.0,2.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# nominal columns will be unweighted for Cohen's kappa
# create a new df for unweighted columns

round1_unweighted = round1.filter(['CCOR', 'CCOL',
                                   'SSSF',
                                   'FOIR', 'FOIL',
                                   'FSIR', 'FSIL',
                                   'NAS', 'NBC', 'NBS',
                                   'NOR', 'NOL',
                                   'NFS', 'OBS', 'PBD', 'SPS', 'TPS', 'PS',
                                   'PFR', 'PFL',
                                   'MFLoR', 'MFLoL',
                                   'CRBR', 'CRBL',
                                   'EPBR', 'EPBL',
                                   'FTAR', 'FTAL',
                                   'PNBR', 'PNBL',
                                   'ASTR', 'ASTL',
                                   'OMBR', 'OMBL',
                                   'INCA', 'BREG', 'PALT',
                                   'MANT', 'PHAR', 'APIC',
                                   'LBMR', 'LBML',
                                   'LBLaR', 'LBLaL',
                                   'LBSO', 'MVP', 'TWIG'], axis=1)
round1_unweighted.head(10)

Unnamed: 0,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,NAS,NBC,NBS,NOR,NOL,NFS,OBS,PBD,SPS,TPS,PS,PFR,PFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,INCA,BREG,PALT,MANT,PHAR,APIC,LBMR,LBML,LBLaR,LBLaL,LBSO,MVP,TWIG
0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,2.0,2.0,2.0,1.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0
1,1.0,1.0,1.0,0.0,0.0,0.0,0.0,2.0,0.0,3.0,1.0,1.0,2.0,1.0,1.0,2.0,3.0,3.0,1.0,1.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1,1.0,0.0
2,0.0,1.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,4.0,0.0,0.0,4.0,1.0,0.0,2.0,1.0,3.0,1.0,0.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1,0.0,1.0
3,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,2.0,0.0,0.0,2.0,3.0,0.0,0.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0,0.0,0.0
4,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1,1.0,0.0
5,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,1.0,2.0,3.0,0.0,0.0,0.0,0.0,1.0,1.0,4.0,4.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1,1.0,1.0
6,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,0.0,1.0,3.0,1.0,1.0,4.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0,1.0,1.0


#### Round 2

In [None]:
# ordinal columns will be weighted for Cohen's kappa
# create a new df for weighted columns

round2_weighted = round2.filter(['SONR', 'SONL',
                                 'SOFR', 'SOFL',
                                 'IFSR', 'IFSL',
                                 'MIFR', 'MIFL',
                                 'ZFFR', 'ZFFL',
                                 'CIVR', 'CIVL',
                                 'PTBR', 'PTBL',
                                 'TYMR', 'TYML',
                                 'AUDR', 'AUDL',
                                 'ANS', 'INA', 'IOB',
                                 'MTR', 'MTL',
                                 'NAW',
                                 'PZTR', 'PZTL',
                                 'ZSR', 'ZSL',
                                 'MFR', 'MFL',
                                 'HYPR', 'HYPL',
                                 'APFR', 'APFL',
                                 'FFR', 'FFL',
                                 'MHBR', 'MHBL',
                                 'MENR', 'MENL',
                                 'SAGB', 'METO'], axis=1)

round2_weighted.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAW,PZTR,PZTL,ZSR,ZSL,MFR,MFL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,SAGB,METO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,3.0,3.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,1.0,1.0,2.0,2.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,2.0,1.0,2.0,3.0,3.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,2.0,2.0,3.0,2.0,2.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0
5,1.0,1.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,2.0,2.0,0.0,1.0,1.0,2.0,2.0,2.0,2.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# nominal columns will be unweighted for Cohen's kappa
# create a new df for unweighted columns

round2_unweighted = round2.filter(['CCOR', 'CCOL',
                                   'SSSF',
                                   'FOIR', 'FOIL',
                                   'FSIR', 'FSIL',
                                   'NAS', 'NBC', 'NBS',
                                   'NOR', 'NOL',
                                   'NFS', 'OBS', 'PBD', 'SPS', 'TPS', 'PS',
                                   'PFR', 'PFL',
                                   'MFLoR', 'MFLoL',
                                   'CRBR', 'CRBL',
                                   'EPBR', 'EPBL',
                                   'FTAR', 'FTAL',
                                   'PNBR', 'PNBL',
                                   'ASTR', 'ASTL',
                                   'OMBR', 'OMBL',
                                   'INCA', 'BREG', 'PALT',
                                   'MANT', 'PHAR', 'APIC',
                                   'LBMR', 'LBML',
                                   'LBLaR', 'LBLaL',
                                   'LBSO', 'MVP', 'TWIG'], axis=1)
round2_unweighted.head(10)

Unnamed: 0,CCOR,CCOL,SSSF,FOIR,FOIL,FSIR,FSIL,NAS,NBC,NBS,NOR,NOL,NFS,OBS,PBD,SPS,TPS,PS,PFR,PFL,MFLoR,MFLoL,CRBR,CRBL,EPBR,EPBL,FTAR,FTAL,PNBR,PNBL,ASTR,ASTL,OMBR,OMBL,INCA,BREG,PALT,MANT,PHAR,APIC,LBMR,LBML,LBLaR,LBLaL,LBSO,MVP,TWIG
0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,4.0,3.0,0.0,2.0,1.0,2.0,1.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0
1,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,3.0,1.0,1.0,2.0,1.0,1.0,2.0,3.0,3.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1,0.0,0.0
2,0.0,1.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,4.0,0.0,0.0,0.0,1.0,0.0,4.0,1.0,3.0,1.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1,0.0,1.0
3,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,3.0,0.0,0.0,2.0,1.0,0.0,2.0,3.0,3.0,1.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1,0.0,0.0
4,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,4.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1,1.0,0.0
5,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,1.0,2.0,3.0,1.0,0.0,0.0,0.0,1.0,1.0,4.0,5.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1,1.0,1.0
6,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,3.0,3.0,1.0,1.0,2.0,1.0,0.0,0.0,1.0,3.0,1.0,1.0,4.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1,1.0,1.0


In [None]:
round1_weighted.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAW,PZTR,PZTL,ZSR,ZSL,MFR,MFL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,SAGB,METO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,3.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,0.0,1.0,2.0,1.0,2.0,2.0,2.0,3.0,3.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,3.0,3.0,1.0,1.0,2.0,2.0,4.0,3.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,3.0,2.0,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,1.0,1.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,3.0,2.0,3.0,3.0,2.0,2.0,2.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
5,4.0,2.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,2.0,2.0,0.0,1.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
round2_weighted.head(10)

Unnamed: 0,SONR,SONL,SOFR,SOFL,IFSR,IFSL,MIFR,MIFL,ZFFR,ZFFL,CIVR,CIVL,PTBR,PTBL,TYMR,TYML,AUDR,AUDL,ANS,INA,IOB,MTR,MTL,NAW,PZTR,PZTL,ZSR,ZSL,MFR,MFL,HYPR,HYPL,APFR,APFL,FFR,FFL,MHBR,MHBL,MENR,MENL,SAGB,METO
0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,4.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,3.0,3.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,1.0,1.0,1.0,2.0,2.0,3.0,1.0,1.0,2.0,2.0,3.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,2.0,1.0,2.0,3.0,3.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,1.0,2.0,2.0,2.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,3.0,2.0,2.0,2.0,3.0,2.0,2.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0
5,1.0,1.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,2.0,2.0,0.0,1.0,1.0,2.0,2.0,2.0,2.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
6,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


---

## Cohen's kappa

In [None]:
from sklearn.metrics import cohen_kappa_score

#### Weighted Cohen's kappa 1x1

In [None]:
'''round1_SONR = round1.filter(['SONR'], axis=1)
round1_SONR.info()
'''

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   SONR    7 non-null      int64
dtypes: int64(1)
memory usage: 184.0 bytes


In [None]:
'''
round2_SONR = round2.filter(['SONR'], axis=1)
round2_SONR.info()
'''

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   SONR    7 non-null      int64
dtypes: int64(1)
memory usage: 184.0 bytes


In [None]:
'''
cohen_SONR = cohen_kappa_score(round1_SONR, round2_SONR)
cohen_SONR
'''

1.0

In [None]:
'''
cohen_SONR = cohen_kappa_score(round1_weighted['SONR'], round2_weighted['SONR'],
                               weights='quadratic')
cohen_SONR
'''

0.8108108108108107

In [None]:
'''
cohen_SONL = cohen_kappa_score(round1['SONL'], round2['SONL'], weights='quadratic')
cohen_SONL
'''

1.0

In [None]:
'''
cohen_SOFR = cohen_kappa_score(round1['SOFR'], round2['SOFR'], weights='quadratic')
cohen_SOFR
'''

1.0

In [None]:
#cohen_PBD = cohen_kappa_score(round1['PBD'], round2['PBD'], weights='quadratic')
#cohen_PBD

0.588235294117647

In [None]:
'''
cohen_SOFL = cohen_kappa_score(round1['SOFL'], round2['SOFL'], weights='quadratic')
cohen_SOFL

1.0

In [None]:
#round1_weighted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 40 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   SONR    7 non-null      int64
 1   SONL    7 non-null      int64
 2   SOFR    7 non-null      int64
 3   SOFL    7 non-null      int64
 4   IFSR    7 non-null      int64
 5   IFSL    7 non-null      int64
 6   MIFR    7 non-null      int64
 7   MIFL    7 non-null      int64
 8   ZFFR    7 non-null      int64
 9   ZFFL    7 non-null      int64
 10  CIVR    7 non-null      int64
 11  CIVL    7 non-null      int64
 12  PTBR    7 non-null      int64
 13  PTBL    7 non-null      int64
 14  TYMR    7 non-null      int64
 15  TYML    7 non-null      int64
 16  AUDR    7 non-null      int64
 17  AUDL    7 non-null      int64
 18  ANS     7 non-null      int64
 19  INA     7 non-null      int64
 20  IOB     7 non-null      int64
 21  MTR     7 non-null      int64
 22  MTL     7 non-null      int64
 23  NAW     7 non-null 

### Cohen's weighted kappa: Function and loop

In [None]:
def run_cohen_kappa_weighted(index):
    column1 = list(round1_weighted.columns)
    column2 = list(round2_weighted.columns)

    #print(column2[index])
    y = cohen_kappa_score(round1_weighted[column1[index]], round2_weighted[column2[index]],  weights='quadratic')
    '''
    print(round1_weighted[column1[index]])
    print(round2_weighted[column1[index]])
    print(column1[index], '\t', y)'''
    print(y)
    return(y)
    '''
    my_str = ''


    print('abc')
    print(y)



    #print(column1[index])

    ##my_cohen_weighted_df[column2[index]] = values
    ##my_cohen_weighted_df.head(10)
    '''

In [None]:
column_names = ['Trait Code', 'n pairs', 'Cohen Score']                 # create column names for new df
cnm_Cohen_weighted_output = pd.DataFrame(columns = column_names)      # create new df for Cohen results
column1 = list(round1_weighted.columns)

for i in range(len(column1)):
    #print(i)
    Cohen_score = run_cohen_kappa_weighted(i)
    #print((round1_weighted[column1[i]].size))
    obj1 = (round1_weighted[column1[i]].size)
    #print(column1[i], '\t', (obj1), '\t', round(Cohen_score, 5))
    cnm_Cohen_weighted_output.loc[i] = [column1[i], obj1, round(Cohen_score, 5)]

0.8108108108108107
0.65
1.0
1.0
1.0
1.0
0.9041095890410958
0.46153846153846145
1.0
1.0
1.0
0.588235294117647
0.588235294117647
1.0
nan
nan
nan
nan
1.0
0.8870967741935484
1.0
0.7941176470588235
0.8108108108108107
0.7741935483870968
0.676923076923077
0.9090909090909091
1.0
1.0
1.0
1.0
0.7741935483870968
0.8108108108108107
0.8444444444444444
0.8
1.0
nan
1.0
1.0
nan
nan
0.0
nan


  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


In [None]:
# print entire weighted output

print(cnm_Cohen_weighted_output)

   Trait Code n pairs  Cohen Score
0        SONR       7      0.81081
1        SONL       7      0.65000
2        SOFR       7      1.00000
3        SOFL       7      1.00000
4        IFSR       7      1.00000
5        IFSL       7      1.00000
6        MIFR       7      0.90411
7        MIFL       7      0.46154
8        ZFFR       7      1.00000
9        ZFFL       7      1.00000
10       CIVR       7      1.00000
11       CIVL       7      0.58824
12       PTBR       7      0.58824
13       PTBL       7      1.00000
14       TYMR       7          NaN
15       TYML       7          NaN
16       AUDR       7          NaN
17       AUDL       7          NaN
18        ANS       7      1.00000
19        INA       7      0.88710
20        IOB       7      1.00000
21        MTR       7      0.79412
22        MTL       7      0.81081
23        NAW       7      0.77419
24       PZTR       7      0.67692
25       PZTL       7      0.90909
26        ZSR       7      1.00000
27        ZSL       

##### Replace the NaN Cohen scores with 1.0
* The cohen_kappa_score function from sklearn cannot mathematically reconcile two comparisons where all values (from both rounds for a single trait) are 0. Therefore, it gives those variables a Cohen kappa score of NaN, which you must override to 1. The logic behind this is: if all values from both observation rounds are 0, then the Level of Agreement is Perfect (or 1.0).


In [None]:
cnm_Cohen_weighted_output.replace(np.nan, 1, inplace=True)

In [None]:
cnm_Cohen_weighted_output.head(15)

Unnamed: 0,Trait Code,n pairs,Cohen Score
0,SONR,7,0.81081
1,SONL,7,0.65
2,SOFR,7,1.0
3,SOFL,7,1.0
4,IFSR,7,1.0
5,IFSL,7,1.0
6,MIFR,7,0.90411
7,MIFL,7,0.46154
8,ZFFR,7,1.0
9,ZFFL,7,1.0


#### Add 'Level of Agreement' column based on 'Cohen Score' column

In [None]:
conditions = [
    (cnm_Cohen_weighted_output['Cohen Score'] <= 0.2),
    (cnm_Cohen_weighted_output['Cohen Score'] >= 0.21) & (cnm_Cohen_weighted_output['Cohen Score'] <= 0.39),
    (cnm_Cohen_weighted_output['Cohen Score'] >= 0.4) & (cnm_Cohen_weighted_output['Cohen Score'] < 0.6),
    (cnm_Cohen_weighted_output['Cohen Score'] >= 0.6) & (cnm_Cohen_weighted_output['Cohen Score'] < 0.8),
    (cnm_Cohen_weighted_output['Cohen Score'] >= 0.8) & (cnm_Cohen_weighted_output['Cohen Score'] <= 0.9),
    (cnm_Cohen_weighted_output['Cohen Score'] > 0.9)]
choices = ['None', 'Minimal', 'Weak', 'Moderate', 'Strong', 'Almost Perfect']
cnm_Cohen_weighted_output['Level of Agreement'] = np.select(conditions, choices)
print(cnm_Cohen_weighted_output)

   Trait Code n pairs  Cohen Score Level of Agreement           Cohen Type
0        SONR       7      0.81081             Strong  Weighted, Quadratic
1        SONL       7      0.65000           Moderate  Weighted, Quadratic
2        SOFR       7      1.00000     Almost Perfect  Weighted, Quadratic
3        SOFL       7      1.00000     Almost Perfect  Weighted, Quadratic
4        IFSR       7      1.00000     Almost Perfect  Weighted, Quadratic
5        IFSL       7      1.00000     Almost Perfect  Weighted, Quadratic
6        MIFR       7      0.90411     Almost Perfect  Weighted, Quadratic
7        MIFL       7      0.46154               Weak  Weighted, Quadratic
8        ZFFR       7      1.00000     Almost Perfect  Weighted, Quadratic
9        ZFFL       7      1.00000     Almost Perfect  Weighted, Quadratic
10       CIVR       7      1.00000     Almost Perfect  Weighted, Quadratic
11       CIVL       7      0.58824               Weak  Weighted, Quadratic
12       PTBR       7    

In [None]:
cnm_Cohen_weighted_output.sort_values(by='Cohen Score', ascending=True)

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type
40,SAGB,7,0.0,,"Weighted, Quadratic"
7,MIFL,7,0.46154,Weak,"Weighted, Quadratic"
12,PTBR,7,0.58824,Weak,"Weighted, Quadratic"
11,CIVL,7,0.58824,Weak,"Weighted, Quadratic"
1,SONL,7,0.65,Moderate,"Weighted, Quadratic"
24,PZTR,7,0.67692,Moderate,"Weighted, Quadratic"
23,NAW,7,0.77419,Moderate,"Weighted, Quadratic"
30,HYPR,7,0.77419,Moderate,"Weighted, Quadratic"
21,MTR,7,0.79412,Moderate,"Weighted, Quadratic"
33,APFL,7,0.8,Strong,"Weighted, Quadratic"


In [None]:
cnm_Cohen_weighted_output['Cohen Type'] = 'Weighted, Quadratic'

In [None]:
cnm_Cohen_weighted_output.head(15)

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type
0,SONR,7,0.81081,Strong,"Weighted, Quadratic"
1,SONL,7,0.65,Moderate,"Weighted, Quadratic"
2,SOFR,7,1.0,Almost Perfect,"Weighted, Quadratic"
3,SOFL,7,1.0,Almost Perfect,"Weighted, Quadratic"
4,IFSR,7,1.0,Almost Perfect,"Weighted, Quadratic"
5,IFSL,7,1.0,Almost Perfect,"Weighted, Quadratic"
6,MIFR,7,0.90411,Almost Perfect,"Weighted, Quadratic"
7,MIFL,7,0.46154,Weak,"Weighted, Quadratic"
8,ZFFR,7,1.0,Almost Perfect,"Weighted, Quadratic"
9,ZFFL,7,1.0,Almost Perfect,"Weighted, Quadratic"


#### Results (weighted) file:
cnm_Cohen_weighted_output

---

### Cohen's unweighted kappa: Function and loop

In [None]:
def run_cohen_kappa_unweighted(index):
    column1 = list(round1_unweighted.columns)
    column2 = list(round2_unweighted.columns)

    #print(column2[index])
    y = cohen_kappa_score(round1_unweighted[column1[index]], round2_unweighted[column2[index]], weights=None)
    '''
    print(round1_unweighted[column1[index]])
    print(round2_unweighted[column1[index]])
    print(column1[index], '\t', y)'''
    print(y)
    return(y)
    '''
    my_str = ''


    print('abc')
    print(y)



    #print(column1[index])

    ##my_cohen_unweighted_df[column2[index]] = values
    ##my_cohen_unweighted_df.head(10)
    '''

In [None]:
column_names = ['Trait Code', 'n pairs', 'Cohen Score']                 # create column names for new df
cnm_Cohen_unweighted_output = pd.DataFrame(columns = column_names)      # create new df for Cohen results
column1 = list(round1_unweighted.columns)

for i in range(len(column1)):
    #print(i)
    Cohen_score = run_cohen_kappa_unweighted(i)
    #print((round1_unweighted[column1[i]].size))
    obj1 = (round1_unweighted[column1[i]].size)
    #print(column1[i], '\t', (obj1), '\t', round(Cohen_score, 5))
    cnm_Cohen_unweighted_output.loc[i] = [column1[i], obj1, round(Cohen_score, 5)]

1.0
1.0
nan
1.0
1.0
nan
1.0
0.7666666666666666
0.78125
0.7407407407407407
1.0
1.0
0.7407407407407407
0.6956521739130435
0.588235294117647
0.5
0.78125
1.0
1.0
0.6956521739130435
0.611111111111111
0.16000000000000003
0.6956521739130435
1.0
1.0
nan
nan
nan
1.0
1.0
1.0
nan
0.588235294117647
1.0
nan
nan
1.0
1.0
nan
1.0
1.0
1.0

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)



0.6956521739130435
1.0
0.36363636363636376
0.72
1.0


In [None]:
# print entire unweighted output

print(cnm_Cohen_unweighted_output)

   Trait Code n pairs  Cohen Score
0        CCOR       7      1.00000
1        CCOL       7      1.00000
2        SSSF       7          NaN
3        FOIR       7      1.00000
4        FOIL       7      1.00000
5        FSIR       7          NaN
6        FSIL       7      1.00000
7         NAS       7      0.76667
8         NBC       7      0.78125
9         NBS       7      0.74074
10        NOR       7      1.00000
11        NOL       7      1.00000
12        NFS       7      0.74074
13        OBS       7      0.69565
14        PBD       7      0.58824
15        SPS       7      0.50000
16        TPS       7      0.78125
17         PS       7      1.00000
18        PFR       7      1.00000
19        PFL       7      0.69565
20      MFLoR       7      0.61111
21      MFLoL       7      0.16000
22       CRBR       7      0.69565
23       CRBL       7      1.00000
24       EPBR       7      1.00000
25       EPBL       7          NaN
26       FTAR       7          NaN
27       FTAL       

##### Replace the NaN Cohen scores with 1.0
* The cohen_kappa_score function from sklearn cannot mathematically reconcile two comparisons where all values (from both rounds for a single trait) are 0. Therefore, it gives those variables a Cohen kappa score of NaN, which you must override to 1. The logic behind this is: if all values from both observation rounds are 0, then the Level of Agreement is Perfect (or 1.0).


In [None]:
cnm_Cohen_unweighted_output.replace(np.nan, 1, inplace=True)

In [None]:
cnm_Cohen_unweighted_output.head(15)

Unnamed: 0,Trait Code,n pairs,Cohen Score
0,CCOR,7,1.0
1,CCOL,7,1.0
2,SSSF,7,1.0
3,FOIR,7,1.0
4,FOIL,7,1.0
5,FSIR,7,1.0
6,FSIL,7,1.0
7,NAS,7,0.76667
8,NBC,7,0.78125
9,NBS,7,0.74074


#### Add 'Level of Agreement' column based on 'Cohen Score' column

In [None]:
conditions = [
    (cnm_Cohen_unweighted_output['Cohen Score'] <= 0.2),
    (cnm_Cohen_unweighted_output['Cohen Score'] >= 0.21) & (cnm_Cohen_unweighted_output['Cohen Score'] < 0.4),
    (cnm_Cohen_unweighted_output['Cohen Score'] >= 0.4) & (cnm_Cohen_unweighted_output['Cohen Score'] < 0.6),
    (cnm_Cohen_unweighted_output['Cohen Score'] >= 0.6) & (cnm_Cohen_unweighted_output['Cohen Score'] < 0.8),
    (cnm_Cohen_unweighted_output['Cohen Score'] >= 0.8) & (cnm_Cohen_unweighted_output['Cohen Score'] <= 0.9),
    (cnm_Cohen_unweighted_output['Cohen Score'] > 0.9)]
choices = ['None', 'Minimal', 'Weak', 'Moderate', 'Strong', 'Almost Perfect']
cnm_Cohen_unweighted_output['Level of Agreement'] = np.select(conditions, choices)
print(cnm_Cohen_unweighted_output)

   Trait Code n pairs  Cohen Score Level of Agreement  Cohen Type
0        CCOR       7      1.00000     Almost Perfect  Unweighted
1        CCOL       7      1.00000     Almost Perfect  Unweighted
2        SSSF       7      1.00000     Almost Perfect  Unweighted
3        FOIR       7      1.00000     Almost Perfect  Unweighted
4        FOIL       7      1.00000     Almost Perfect  Unweighted
5        FSIR       7      1.00000     Almost Perfect  Unweighted
6        FSIL       7      1.00000     Almost Perfect  Unweighted
7         NAS       7      0.76667           Moderate  Unweighted
8         NBC       7      0.78125           Moderate  Unweighted
9         NBS       7      0.74074           Moderate  Unweighted
10        NOR       7      1.00000     Almost Perfect  Unweighted
11        NOL       7      1.00000     Almost Perfect  Unweighted
12        NFS       7      0.74074           Moderate  Unweighted
13        OBS       7      0.69565           Moderate  Unweighted
14        

In [None]:
cnm_Cohen_unweighted_output.sort_values(by='Cohen Score', ascending=True)

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type
21,MFLoL,7,0.16,,Unweighted
44,LBSO,7,0.36364,Minimal,Unweighted
15,SPS,7,0.5,Weak,Unweighted
14,PBD,7,0.58824,Weak,Unweighted
32,OMBR,7,0.58824,Weak,Unweighted
20,MFLoR,7,0.61111,Moderate,Unweighted
13,OBS,7,0.69565,Moderate,Unweighted
22,CRBR,7,0.69565,Moderate,Unweighted
19,PFL,7,0.69565,Moderate,Unweighted
42,LBLaR,7,0.69565,Moderate,Unweighted


In [None]:
cnm_Cohen_unweighted_output['Cohen Type'] = 'Unweighted'

In [None]:
cnm_Cohen_unweighted_output.head()

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type
0,CCOR,7,1.0,Almost Perfect,Unweighted
1,CCOL,7,1.0,Almost Perfect,Unweighted
2,SSSF,7,1.0,Almost Perfect,Unweighted
3,FOIR,7,1.0,Almost Perfect,Unweighted
4,FOIL,7,1.0,Almost Perfect,Unweighted


#### Results (weighted) file:
cnm_Cohen_unweighted_output

## Merge Cohen results (weighted and unweighted)

In [None]:
cnm_Cohen_output = pd.DataFrame()
cnm_Cohen_output = cnm_Cohen_weighted_output.append(cnm_Cohen_unweighted_output)

In [None]:
cnm_Cohen_output = cnm_Cohen_output.sort_values(by='Cohen Score', ascending=True)

In [None]:
cnm_Cohen_output.head(20)

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type
40,SAGB,7,0.0,,"Weighted, Quadratic"
21,MFLoL,7,0.16,,Unweighted
44,LBSO,7,0.36364,Minimal,Unweighted
7,MIFL,7,0.46154,Weak,"Weighted, Quadratic"
15,SPS,7,0.5,Weak,Unweighted
32,OMBR,7,0.58824,Weak,Unweighted
14,PBD,7,0.58824,Weak,Unweighted
11,CIVL,7,0.58824,Weak,"Weighted, Quadratic"
12,PTBR,7,0.58824,Weak,"Weighted, Quadratic"
20,MFLoR,7,0.61111,Moderate,Unweighted


In [None]:
cnm_Cohen_output.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 40 to 46
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Trait Code          89 non-null     object 
 1   n pairs             89 non-null     object 
 2   Cohen Score         89 non-null     float64
 3   Level of Agreement  89 non-null     object 
 4   Cohen Type          89 non-null     object 
dtypes: float64(1), object(4)
memory usage: 4.2+ KB


## Combine Cohen results table with trait code/name key

### **Input data files:**
*   *trait codes and names.xlsx*
*   This file contains the trait codes and associated trait names for all cnm data.


In [None]:
codes_names = pd.read_excel('trait codes and names.xlsx')
codes_names.head()

Unnamed: 0,Trait Code,Trait Name
0,SONR,Supraorbital notch R
1,SONL,Supraorbital notch L
2,SOFR,Supraorbital foramen R
3,SOFL,Supraorbital foramen L
4,IFSR,Infraorbital Suture R


In [None]:
results_named = pd.merge(cnm_Cohen_output, codes_names, on='Trait Code')
results_named.head()

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type,Trait Name
0,SAGB,7,0.0,,"Weighted, Quadratic",Sagittal oss
1,MFLoL,7,0.16,,Unweighted,Mastoid foramen location L
2,LBSO,7,0.36364,Minimal,Unweighted,Lamb sut oss
3,MIFL,7,0.46154,Weak,"Weighted, Quadratic",Multiple Infraorbital Foramina L
4,SPS,7,0.5,Weak,Unweighted,Supranasal suture


In [None]:
results_named.head(30)

Unnamed: 0,Trait Code,n pairs,Cohen Score,Level of Agreement,Cohen Type,Trait Name
0,SAGB,7,0.0,,"Weighted, Quadratic",Sagittal oss
1,MFLoL,7,0.16,,Unweighted,Mastoid foramen location L
2,LBSO,7,0.36364,Minimal,Unweighted,Lamb sut oss
3,MIFL,7,0.46154,Weak,"Weighted, Quadratic",Multiple Infraorbital Foramina L
4,SPS,7,0.5,Weak,Unweighted,Supranasal suture
5,OMBR,7,0.58824,Weak,Unweighted,Occipito-mastoid suture oss R
6,PBD,7,0.58824,Weak,Unweighted,Postbregmatic depression
7,CIVL,7,0.58824,Weak,"Weighted, Quadratic",Pterygo-spinous Bridge L
8,PTBR,7,0.58824,Weak,"Weighted, Quadratic",Pterygo-alar Bridge R
9,MFLoR,7,0.61111,Moderate,Unweighted,Mastoid foramen location R


In [None]:
results_named.to_excel('/drive/My Drive/Colab Notebooks/Pre-statistical treatments/1_Observer error/cnm_intraobserver_Cohen_results.xlsx', index=False)

### **Output data file:**
*   *cnm_intraobserver_Cohen_results.csv*
*   This is the output file from the (Intra)Observer Error folder with combined weighted and unweighted Cohen results

In [None]:
#cnm_Cohen_output.to_csv('/drive/My Drive/Colab Notebooks/Pre-statistical treatments/1_Observer error/cnm_intraobserver_Cohen_results.csv', index=False)

---

## Merge cnm df with demographics

### US

#### **Input data file:**
*   *cnm_US.xlsx*: This file contains the full US cnm data
*   *demographics.xlsx*: This file contains the demographic data

In [None]:
cnm_US = pd.read_excel('cnm_US.xlsx')
cnm_US.info(10)

In [None]:
cnm_US.tail(15)

In [None]:
demographics = pd.read_excel('demographics.xlsx')

In [None]:
demographics.head()

In [None]:
cnm_US = pd.merge(cnm_US, demographics, on='SkelID')

In [None]:
cnm_US.tail()

In [None]:
cnm_US.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 65 entries, 0 to 64
Data columns (total 97 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   SkelID                            65 non-null     int64  
 1   Collection                        65 non-null     object 
 2   Supraorbital notch R              65 non-null     int64  
 3   Supraorbital notch L              65 non-null     int64  
 4   Supraorbital foramen R            65 non-null     int64  
 5   Supraorbital foramen L            65 non-null     int64  
 6   Infraorbital suture R             65 non-null     int64  
 7   Infraorbital suture L             65 non-null     int64  
 8   Multiple infraorbital foramina R  65 non-null     int64  
 9   Multiple infraorbital foramina L  65 non-null     int64  
 10  Zygomatico-facial foramina R      65 non-null     int64  
 11  Zygomatico-facial foramina L      65 non-null     int64  
 12  Condylar c

---

### Japan

#### **Input data file:**
*   *cnm_Japan.xlsx*
*   This file contain the full Japan cnm data and associated demographics

In [None]:
cnm_Japan = pd.read_excel('cnm_Japan.xlsx')

In [None]:
cnm_Japan['Population'] = 'Japanese'

In [None]:
cnm_Japan['Population2'] = np.nan
cnm_Japan['Population3'] = np.nan
cnm_Japan['Population4'] = np.nan

In [None]:
cnm_Japan.head()

---

### Merge US & Japan

Check that the US and Japan dfs have the same columns

In [None]:
cnm_Japan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 97 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   SkelID                            32 non-null     int64  
 1   Collection                        32 non-null     object 
 2   Sex                               32 non-null     object 
 3   Age                               32 non-null     int64  
 4   Supraorbital notch R              32 non-null     int64  
 5   Supraorbital notch L              32 non-null     int64  
 6   Supraorbital foramen R            32 non-null     int64  
 7   Supraorbital foramen L            32 non-null     int64  
 8   Infraorbital suture R             32 non-null     int64  
 9   Infraorbital suture L             32 non-null     int64  
 10  Multiple infraorbital foramina R  32 non-null     int64  
 11  Multiple infraorbital foramina L  32 non-null     int64  
 12  Zygomatico

In [None]:
cnm_US.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 65 entries, 0 to 64
Data columns (total 97 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   SkelID                            65 non-null     int64  
 1   Collection                        65 non-null     object 
 2   Supraorbital notch R              65 non-null     int64  
 3   Supraorbital notch L              65 non-null     int64  
 4   Supraorbital foramen R            65 non-null     int64  
 5   Supraorbital foramen L            65 non-null     int64  
 6   Infraorbital suture R             65 non-null     int64  
 7   Infraorbital suture L             65 non-null     int64  
 8   Multiple infraorbital foramina R  65 non-null     int64  
 9   Multiple infraorbital foramina L  65 non-null     int64  
 10  Zygomatico-facial foramina R      65 non-null     int64  
 11  Zygomatico-facial foramina L      65 non-null     int64  
 12  Condylar c

In [None]:
cnm_merged = cnm_US.append(cnm_Japan)

In [None]:
cnm_merged = cnm_merged.reset_index()

In [None]:
del cnm_merged['index']

In [None]:
cnm_merged.tail(100)

In [None]:
from google.colab import  drive
drive.mount('/drive')

Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).


In [None]:
cnm_merged.to_csv('/drive/My Drive/Colab Notebooks/Pre-statistical treatments/cnm.csv')

---

## Remove traits

In [None]:
cnm_merged = pd.read_excel('cnm_merged.xlsx')

Check the head and tail to make sure the two dfs were merged properly. The head should contain data from the US Collection, while the tail should contain data from the Japan Collection.

In [None]:
cnm_merged.head(15)

In [None]:
cnm_merged.tail(15)

In [None]:
cnm_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 97 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   SkelID                            97 non-null     int64  
 1   Collection                        97 non-null     object 
 2   Sex                               97 non-null     object 
 3   Age                               97 non-null     int64  
 4   Population                        97 non-null     object 
 5   Population2                       19 non-null     object 
 6   Population3                       10 non-null     object 
 7   Population4                       1 non-null      object 
 8   Supraorbital notch R              97 non-null     int64  
 9   Supraorbital notch L              97 non-null     int64  
 10  Supraorbital foramen R            97 non-null     int64  
 11  Supraorbital foramen L            97 non-null     int64  
 12  Infraorbit

In [None]:
# Rename traits

cnm = cnm_merged.rename(columns={'Metopism': 'METO',
                                  'Inca': 'INCA',
                                  'Occipito-mastoid suture oss R': 'OMBR',
                                  'Occipito-mastoid suture oss L': 'OMBL',
                                  'Asterionic oss R': 'ASTR',
                                  'Asterionic oss L': 'ASTL',
                                  'Parietal notch oss R': 'PNBR',
                                  'Parietal notch oss L': 'PNBL',
                                  'Pharyngeal fossa': 'PHAR',
                                  'Divided hypoglossal canal R': 'HYPR',
                                  'Divided hypoglossal canal L': 'HYPL',
                                  'Tympanic dihiscence R': 'TYMR',
                                  'Tympanic dihiscence L': 'TYML',
                                  'Pterygo-spinous bridge R': 'CIVR',
                                  'Pterygo-spinous bridge L': 'CIVL',
                                  'Pterygo-alar bridge R': 'PTBR',
                                  'Pterygo-alar bridge L': 'PTBL',
                                  'Supraorbital foramen R': 'SOFR',
                                  'Supraorbital foramen L': 'SOFL',
                                  'Supraorbital notch R': 'SONR',
                                  'Supraorbital notch L': 'SONL',
                                  'Acc mental foramen R': 'MENR',
                                  'Acc mental foramen L': 'MENL',
                                  'Mylohyoid bridge R': 'MHBR',
                                  'Mylohyoid bridge L': 'MHBL',
                                  'Parietal foramen R': 'PFR',
                                  'Parietal foramen L': 'PFL',
                                  'Mastoid foramen number R': 'MFR',
                                  'Mastoid foramen number L': 'MFL',
                                  'Coronal oss R': 'CRBR',
                                  'Coronal oss L': 'CRBL',
                                  'Epipteric oss R': 'EPBR',
                                  'Epipteric oss L': 'EPBL',
                                  'Fronto-temp articulation R': 'FTAR',
                                  'Fronto-temp articulation L': 'FTAL',
                                  'Acc lesser palatine foramen R': 'APFR',
                                  'Acc lesser palatine foramen L': 'APFL',
                                  'Infraorbital suture R': 'IFSR',
                                  'Infraorbital suture L': 'IFSL',
                                  'Multiple infraorbital foramina R': 'MIFR',
                                  'Multiple infraorbital foramina L': 'MIFL',
                                  'Sagittal oss': 'SAGB',
                                  'Bregma oss': 'BREG',
                                  'Palatine torus': 'PALT',
                                  'Mandibular torus': 'MANT',
                                  'Flexure of sup sagittal sulcus': 'SSSF',
                                  'Zygomatico-facial foramina R': 'ZFFR',
                                  'Zygomatico-facial foramina L': 'ZFFL',
                                  'Condylar canal R': 'CCOR',
                                  'Condylar canal L': 'CCOL',
                                  'Foramen ovale incomplete R': 'FOIR',
                                  'Foramen ovale incomplete L': 'FOIL',
                                  'Foramen spinosum incomplete R': 'FSIR',
                                  'Foramen spinosum incomplete L': 'FSIL',
                                  'Auditory exostosis R': 'AUDR',
                                  'Auditory exostosis L': 'AUDL',
                                  'Anterior nasal spine': 'ANS',
                                  'Inferior nasal aperture': 'INA',
                                  'Interorbital breadth': 'IOB',
                                  'Malar tubercle R': 'MTR',
                                  'Malar tubercle L': 'MTL',
                                  'Nasal aperture shape': 'NAS',
                                  'Nasal aperture width': 'NAW',
                                  'Nasal bone contour': 'NBC',
                                  'Nasal bone shape': 'NBS',
                                  'Nasal overgrowth R': 'NOR',
                                  'Nasal overgrowth L': 'NOL',
                                  'Nasofrontal suture': 'NFS',
                                  'Orbital shape': 'OBS',
                                  'Postbregmatic depression': 'PBD',
                                  'Posterior zygomatic tubercle R': 'PZTR',
                                  'Posterior zygomatic tubercle L': 'PZTL',
                                  'Supranasal suture': 'SPS',
                                  'Transverse palatine suture': 'TPS',
                                  'Palate shape': 'PS',
                                  'Zygomaticomaxillary suture R': 'ZSR',
                                  'Zygomaticomaxillary suture L': 'ZSL',
                                  'Mastoid foramen location R': 'MFLoR',
                                  'Mastoid foramen location L': 'MFLoL',
                                  'Lamb oss med R': 'LBMR',
                                  'Lamb oss med L': 'LBML',
                                  'Lamb oss lat R': 'LBLaR',
                                  'Lamb oss lat L': 'LBLaL',
                                  'Lamb sut oss': 'LBSO',
                                  'Apical bone': 'APIC',
                                  'Frontal foramina R': 'FFR',
                                  'Frontal foramina L': 'FFL',
                                  'Male vermiculate pattern': 'MVP',
                                  'Twigs': 'TWIG'})
cnm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 97 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   SkelID       97 non-null     int64  
 1   Collection   97 non-null     object 
 2   Sex          97 non-null     object 
 3   Age          97 non-null     int64  
 4   Population   97 non-null     object 
 5   Population2  19 non-null     object 
 6   Population3  10 non-null     object 
 7   Population4  1 non-null      object 
 8   SONR         97 non-null     int64  
 9   SONL         97 non-null     int64  
 10  SOFR         97 non-null     int64  
 11  SOFL         97 non-null     int64  
 12  IFSR         97 non-null     int64  
 13  IFSL         97 non-null     int64  
 14  MIFR         97 non-null     int64  
 15  MIFL         97 non-null     int64  
 16  ZFFR         97 non-null     int64  
 17  ZFFL         97 non-null     int64  
 18  CCOR         97 non-null     int64  
 19  CCOL      

In [None]:
cnm.head()

Traits to be removed:
*   SAGB
*   MFLoL
*   LBSO
*   MIFL
*   SPS
*   OMBR
*   PBD
*   CIVL
*   PTBR


#### Remove traits with low intraobserver agreement





In [None]:
## Remove traits with low intraobserver agreement:

del cnm['SAGB']
del cnm['MFLoL']
del cnm['LBSO']
del cnm['MIFL']
del cnm['SPS']
del cnm['OMBR']
del cnm['PBD']
del cnm['CIVL']
del cnm['PTBR']

#### Remove traits not in both US and Japan collections


In [None]:
## Remove traits not in both US and Japan collections
# PS (Palate shape)
# MVP (Male vermiculate pattern)
# TWIG (Twigs)

del cnm['PS']
del cnm['MVP']
del cnm['TWIG']

In [None]:
cnm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 85 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   SkelID       97 non-null     int64  
 1   Collection   97 non-null     object 
 2   Sex          97 non-null     object 
 3   Age          97 non-null     int64  
 4   Population   97 non-null     object 
 5   Population2  19 non-null     object 
 6   Population3  10 non-null     object 
 7   Population4  1 non-null      object 
 8   SONR         97 non-null     int64  
 9   SONL         97 non-null     int64  
 10  SOFR         97 non-null     int64  
 11  SOFL         97 non-null     int64  
 12  IFSR         97 non-null     int64  
 13  IFSL         97 non-null     int64  
 14  MIFR         97 non-null     int64  
 15  ZFFR         97 non-null     int64  
 16  ZFFL         97 non-null     int64  
 17  CCOR         97 non-null     int64  
 18  CCOL         97 non-null     int64  
 19  SSSF      

In [None]:
cnm.to_excel('/drive/My Drive/Colab Notebooks/Pre-statistical treatments/1_Observer error/cnm_cleaned.xlsx', index=False)

### **Output data file:**
*   *cnm_cleaned.xlsx*
*   This is the output file from the (Intra)Observer Error folder with:
1.   renamed traits using trait codes
2.   traits removed with low IO agreement