![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# Gender Classifier 

**Gender Classifier** detects the gender of the patient in the clinical document. 
It can  classify the documents into `Female`, `Male` and `Unknown`.


- **`classifierdl_gender_sbert`** (works with licensed **`sbiobert_base_cased_mli`**)

It has been trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.) which were annotated internally.

## **Setup**

In [2]:
import json
import os

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import warnings
warnings.filterwarnings('ignore')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

from johnsnowlabs import nlp, medical

spark = start_spark()
spark.sparkContext.setLogLevel("ERROR")

spark

Spark Session already created, some configs may not take.




# **Gender Classifier Pipeline with **sbert****

In [3]:
document = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sbert_embedder = nlp.BertSentenceEmbeddings().pretrained("sbiobert_base_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")\
    .setMaxSentenceLength(512)

gender_classifier = nlp.ClassifierDLModel.pretrained( 'classifierdl_gender_sbert', 'en', 'clinical/models') \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("class")    

gender_pred_pipeline_sbert = nlp.Pipeline(stages=[ 
    document, 
    sbert_embedder, 
    gender_classifier   
    ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model_sbert = gender_pred_pipeline_sbert.fit(empty_data)


sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
[ | ]sbiobert_base_cased_mli download started this may take some time.
Approximate size to download 384.3 MB
Download done! Loading the resource.
[OK!]
classifierdl_gender_sbert download started this may take some time.
Approximate size to download 22.2 MB
[ | ]classifierdl_gender_sbert download started this may take some time.
Approximate size to download 22.2 MB
Download done! Loading the resource.
[OK!]


In [4]:
text ="""social history: shows that  does not smoke cigarettes or drink alcohol,lives in a nursing home.family history: shows a family history of breast cancer."""

gender_pipeline_sbert = nlp.LightPipeline(model_sbert)

result = gender_pipeline_sbert.annotate(text)

result['class'][0]


'Female'

### **Sample Clinical Notes**

In [5]:
text1 = '''social history: shows that  does not smoke cigarettes or drink alcohol,lives in a nursing home.
family history: shows a family history of breast cancer.'''

result = gender_pipeline_sbert.annotate(text1)

result['class'][0]

'Female'

In [6]:
text2 = '''The patient is a 48- year-old, with severe mitral stenosis diagnosed by echocardiography, moderate
 aortic insufficiency and moderate to severe pulmonary hypertension who is being evaluated as a part of a preoperative 
 workup for mitral and possible aortic valve repair or replacement.'''

result = gender_pipeline_sbert.annotate(text2)

result['class'][0]

'Unknown'

In [7]:
text3 = '''HISTORY: The patient is a 57-year-old XX, who I initially saw in the office on 12/27/07, as a referral from the Tomball Breast Center.
On 12/21/07, the patient underwent image-guided needle core biopsy of a 1.5 cm lesion at the 7 o'clock position of the left breast (inferomedial). 
The biopsy returned showing infiltrating ductal carcinoma high histologic grade.
The patient stated that xx had recently felt and her physician had felt a palpable mass in that area prior to her breast imaging.'''

result = gender_pipeline_sbert.annotate(text3)

result['class'][0]

'Female'

In [8]:
text4 = '''The patient states that xx has been overweight for approximately 35 years and has tried multiple weight loss modalities in 
the past including Weight Watchers, NutriSystem, Jenny Craig, TOPS, cabbage diet, grape fruit diet, Slim-Fast, Richard Simmons,
as well as over-the-counter  measures without any long-term sustainable weight loss.
At the time of presentation to the practice, xx is 5 feet 6 inches tall with a weight of 285.4 pounds and a body mass index of 46.
xx has obesity-related comorbidities, which includes hypertension and hypercholesterolemia.'''

result = gender_pipeline_sbert.annotate(text4)

result['class'][0]

'Unknown'

In [9]:
text5 = '''Prostate gland showing moderately differentiated infiltrating adenocarcinoma, 
Gleason 3 + 2 extending to the apex involving both lobes of the prostate, mainly right.'''

result = gender_pipeline_sbert.annotate(text5)

result['class'][0]

'Male'

In [10]:
text6 = '''SKIN: The patient has significant subcutaneous emphysema of the upper chest and 
anterior neck area although he states that the subcutaneous emphysema has improved significantly since yesterday.'''

result = gender_pipeline_sbert.annotate(text6)

result['class'][0]

'Male'

In [11]:
text7 = '''INDICATION: The patient is a 42-year-old XX who is five days out from transanal excision of a benign anterior base lesion.
xx presents today with diarrhea and bleeding. Digital exam reveals bright red blood on the finger.
xx is for exam under anesthesia and control of hemorrhage at this time.
'''
result = gender_pipeline_sbert.annotate(text7)

result['class'][0]

'Male'

In [12]:
text8 = '''INDICATION: ___ year old patient with complicated medical history of paraplegia
and chronic indwelling foley, recurrent MDR UTIs, hx Gallbladder fossa
abscess,type 2 DM, HTN, CAD, DVT s/p left AKA complicated complicated by
respiratory failure requiring tracheostomy and PEG placement, right ischium
osteomyelitis due to chronic pressure ulcers with acute shortness of breath...'''

result = gender_pipeline_sbert.annotate(text8)

result['class'][0]


'Male'