# scispaCy
It is a Python package containing spaCy models for processing biomedical, scientific or clinical text. You can find more info here: https://allenai.github.io/scispacy/
Spacy has already made tokenization,ner, cleaning, vectorization, and others easy. But scispacy provided robust pre-trained pipelines which can be ready to use for bio-medication purposes. Let's see how it's work.

# Pipeline available

1. en_core_sci_sm:	    A full spaCy pipeline for biomedical data.	Download
2. en_core_sci_md:	    A full spaCy pipeline for biomedical data with a larger vocabulary and 50k word vectors.	Download
3. en_core_sci_lg:	    A full spaCy pipeline for biomedical data with a larger vocabulary and 600k word vectors.	Download
4. en_ner_craft_md:	    A spaCy NER model trained on the CRAFT corpus.	Download
5. en_ner_jnlpba_md:	    A spaCy NER model trained on the JNLPBA corpus.	Download
6. en_ner_bc5cdr_md:	    A spaCy NER model trained on the BC5CDR corpus.	Download
7. en_ner_bionlp13cg_md:	A spaCy NER model trained on the BIONLP13CG corpus.

### Importing required libraries

In [1]:
#!pip install -U spacy
#!pip install scispacy
#!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
from scispacy.linking import EntityLinker
#from scispacy.umls_linking import UmlsEntityLinker
import scispacy
import spacy
import en_core_sci_sm
from spacy import displacy
from scispacy.abbreviation import AbbreviationDetector
from scispacy.umls_linking import UmlsEntityLinker

### Class Medico_Extract

In [15]:
class Medico_Extract:
    def __init__ (self,sentence,pre_trained_model_type):
        self.sentence=sentence
        self.pre_trained_model_type=pre_trained_model_type
    
    def Show_entities(self):
        nlp = spacy.load(self.pre_trained_model_type)
        document=nlp(self.sentence)
        return document,nlp
    
    def Show_dependency_parses(self,sents):
        displacy.render(next(sents), style='dep', jupyter=True)
    
    def Show_abbrevation_of_entities(self,nlp,document):
        nlp = spacy.load(self.pre_trained_model_type)
        doc=nlp(document)
        abb=[]
        abbreviation_pipe = AbbreviationDetector(nlp)
        try:
            nlp.add_pipe(abbreviation_pipe)
        except ValueError:
            print("Pipeline already present")
            pass
        for i in doc._.abbreviations:
            print(i)
            print("The Full-form of",i,"is:",i._.long_form,".It's starting index is",i.start,"and ending index is",i.end)
        return abb  
    
    def Show_disp_entities(self,doc):
        displacy.serve(doc, style="ent")
        
    def Show_tables(self,ents):
        values={}
        for x in ents:
            values[x.text]=x.label_
        return values
    
    def Show_Linking(self,doc_ents):
        # Let's look at a random entity!
        #linker = EntityLinker(resolve_abbreviations=True,name="umls")
        # Add the entity linking pipe to the spacy pipeline
        try:
            linker
            print("Linker present")
        except NameError:
            print("Linker not present")
            linker = UmlsEntityLinker(resolve_abbreviations=True, filter_for_definitions=False)
            pass

        try:
            nlp.add_pipe(linker)
        except ValueError:
            print("EntityLinker already present")
            pass
        
        for i in range(0,len(doc_ents)):
            entity=doc_ents[i]
            print("Name: ",entity )
            for umls_ent in entity._.kb_ents:
                print(linker.kb.cui_to_entity[umls_ent[0]])
    
       
        
    
        

### Assigning text 

In [17]:
text = '''Myeloid derived suppressor cells (MDSC) are immature 
          myeloid cells with immunosuppressive activity. 
          They accumulate in tumor-bearing mice and humans 
          with different types of cancer, including hepatocellular 
          carcinoma (HCC).'''

In [18]:
 nlp = spacy.load("en_core_sci_sm")

### Init class Medico_Extract with text and "pipeline" as argument

In [19]:
p1=Medico_Extract(text,"en_core_sci_sm")

### Entities

In [20]:
doc,nlp=p1.Show_entities(); print(list(doc.ents))

[Myeloid derived suppressor cells, MDSC, immature, immunosuppressive activity, accumulate, tumor-bearing mice, humans, cancer, hepatocellular 
          carcinoma, HCC]


### Dependency parses 

In [7]:
p1.Show_dependency_parses((doc.sents))

### Abbreviations

Full-form of all abbreviations are displayed

In [303]:
p1.Show_abbrevation_of_entities(nlp,doc)          

The Full-form of MDSC is: Myeloid derived suppressor cells .It's starting index is 5 and ending index is 6
The Full-form of HCC is: hepatocellular 
          carcinoma .It's starting index is 36 and ending index is 37


### Enitity-labeling 

Showing entity in a sentence form

In [384]:
p1.Show_disp_entities(doc)


Using the 'ent' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


### Table-form entity 

In [318]:
p1.Show_tables(doc.ents)

{'Myeloid derived suppressor cells': 'ENTITY',
 'MDSC': 'ENTITY',
 'immature': 'ENTITY',
 'immunosuppressive activity': 'ENTITY',
 'accumulate': 'ENTITY',
 'tumor-bearing mice': 'ENTITY',
 'humans': 'ENTITY',
 'cancer': 'ENTITY',
 'hepatocellular \n          carcinoma': 'ENTITY',
 'HCC': 'ENTITY'}

### Enitity linkage 

Each entity has it's own defination.  Defination of each entity are as follows.

In [14]:
p1.Show_Linking(doc.ents)

Name:  Myeloid derived suppressor cells
CUI: C4277543, Name: Myeloid-Derived Suppressor Cells
Definition: A heterogeneous, immature population of myeloid cells that can suppress the activity of T-CELLS and NATURAL KILLER CELLS in the INNATE IMMUNE RESPONSE and ADAPTIVE IMMUNE RESPONSE. They play important roles in ONCOGENESIS; INFLAMMATION; and INFECTION.
TUI(s): T025
Aliases: (total: 10): 
	 Cell, Myeloid-Derived Suppressor, Myeloid Derived Suppressor Cells, Myeloid-Derived Suppressor Cell, Myeloid-Derived Suppressor Cell, Myeloid-Derived Suppressor Cell, Suppressor Cell, Myeloid-Derived, Cells, Myeloid-Derived Suppressor, Suppressor Cells, Myeloid-Derived, MDSCs, MDSC
CUI: C1513790, Name: Negative Regulation of Myeloid Cell Activation
Definition: Myeloid Cell Suppression involves interference with, or restraint of, the production and activity of myeloid cells.
TUI(s): T043
Aliases: (total: 1): 
	 Myeloid Cell Suppression
CUI: C0887899, Name: Myeloid Cells
Definition: The classes of B