# AI - Natural Language Processing
## Part 2 - Functionalize NLP for entities


# ONLY IF NEEDED

## Step 1. Install Spacy

If this first time ever using spacy on this computer, you must first do either the ```!conda install``` or ```!pip install```:

### TURN OFF FOR COLAB
Run for ANACONDA

In [None]:
conda install -c conda-forge spacy

#### Which language model is best for you?
<a href="https://spacy.io/usage/models">https://spacy.io/usage/models</a>

## Step 2. Install language model


### ANACONDA ONLY

In [None]:
conda install -c conda-forge spacy-model-en_core_web_sm

# Import libraries

In [1]:
import spacy
import pandas as pd
import glob

## Import hearings
Download <a href="https://drive.google.com/file/d/1EUYLeHpHAAW2MGsrT6_jov9cJ-IuDLg-/view?usp=sharing">this senate hearing</a> and turn it into a spacy doc.

Create a spreadsheet with columns for the entity, the label, and its meaning.

(remember, you will have to also tap elements from weeks' lessons to accomplish this)

In [2]:
# importing text file through `glob`

senate_hearing = glob.glob("senate-hearing.txt")
senate_hearing

['senate-hearing.txt']

In [3]:
# reading file

with open(senate_hearing[0], "r") as my_text:
    print(my_text.read(200)) # gives you the first 200 characters
    text_file = my_text.read() # saves the entire thing in a variable

[Senate Hearing 118-22]
[From the U.S. Government Publishing Office]


                                                        S. Hrg. 118-22

                   IMPLEMENTING IIJA: PERSPECTIVES ON
   


In [4]:
# importing nlp language

import en_core_web_trf

In [5]:
# creating nlp pipeline

nlp = spacy.load("en_core_web_trf")

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  model.load_state_dict(torch.load(filelike, map_location=device))


In [6]:
# creating spacy doc

doc = nlp(text_file)

In [7]:
# lists to hold items we need from the spacy doc

entities = [ word.text.strip() for word in doc.ents ]
labels = [ word.label_ for word in doc.ents ]
meanings = [ spacy.explain(word.label_) for word in doc.ents ]

In [8]:
# converting separate lists into dictionary

hearing_data = [ {"entity": entity, "label": label, "explanation": meaning} for (entity, label, meaning) in zip(entities, labels, meanings) ]
hearing_data

[{'entity': 'MARCH 15',
  'label': 'DATE',
  'explanation': 'Absolute or relative dates or periods'},
 {'entity': 'the Committee on Environment and Public Works',
  'label': 'ORG',
  'explanation': 'Companies, agencies, institutions, etc.'},
 {'entity': 'the World Wide Web',
  'label': 'ORG',
  'explanation': 'Companies, agencies, institutions, etc.'},
 {'entity': 'WASHINGTON',
  'label': 'GPE',
  'explanation': 'Countries, cities, states'},
 {'entity': 'EIGHTEENTH',
  'label': 'DATE',
  'explanation': 'Absolute or relative dates or periods'},
 {'entity': 'THOMAS R. CARPER',
  'label': 'PERSON',
  'explanation': 'People, including fictional'},
 {'entity': 'Delaware',
  'label': 'GPE',
  'explanation': 'Countries, cities, states'},
 {'entity': 'SHELLEY MOORE CAPITO',
  'label': 'PERSON',
  'explanation': 'People, including fictional'},
 {'entity': 'West Virginia',
  'label': 'GPE',
  'explanation': 'Countries, cities, states'},
 {'entity': 'BENJAMIN L. CARDIN',
  'label': 'PERSON',
  'e

In [9]:
df = pd.DataFrame(hearing_data)
df

Unnamed: 0,entity,label,explanation
0,MARCH 15,DATE,Absolute or relative dates or periods
1,the Committee on Environment and Public Works,ORG,"Companies, agencies, institutions, etc."
2,the World Wide Web,ORG,"Companies, agencies, institutions, etc."
3,WASHINGTON,GPE,"Countries, cities, states"
4,EIGHTEENTH,DATE,Absolute or relative dates or periods
...,...,...,...
1488,Packers,ORG,"Companies, agencies, institutions, etc."
1489,Eagles,ORG,"Companies, agencies, institutions, etc."
1490,the years,DATE,Absolute or relative dates or periods
1491,Green,GPE,"Countries, cities, states"


In [10]:
df.to_csv("senate-hearing-keywords.csv", encoding="utf-8", index=False)