## Exploring Named Entity Recognition with NLTK: A Beginner's Guide.

In the vast landscape of information available online, it can be challenging to extract meaningful insights from articles. That's where our project steps in, employing a clever tool called Named Entity Recognition (NER). Think of NER as a friendly guide that helps computers recognize and categorize important details like names of people, places, and organizations within a sea of words.

Our project specifically focuses on reading news articles. Using the Natural Language Toolkit (NLTK), our system breaks down sentences, identifies the roles of words (like names or locations), and highlights significant information. It's like having a language wizard that points out crucial details, making it easier to understand the main points of an article. This not only aids in digesting information quickly but also helps readers, researchers, and analysts make more informed decisions.

In essence, our project is all about simplifying the complexity of text. By applying NER to news articles, we're providing a tool that enhances comprehension, making it accessible for anyone navigating through a barrage of information. Whether you're a student, researcher, or someone curious about current events, our project strives to make the wealth of online knowledge more understandable and user-friendly.

#### Step 1: Importing Necessary Libraries and Resources

In [2]:
from nltk import ne_chunk, pos_tag, word_tokenize
from nltk.tree import Tree

#--- Read in text file(text.txt) ----
filepath = "textfile.txt"

with open(filepath, 'r', encoding='utf-8') as file:
    my_text = file.read()

my_text

'WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement. Barack Obama is the husband of Michelle Obama.'

#### Task 2: Tokenizing the Sentence.
 Now, we need to split each word into a separate element. By doing this, we enable our system to analyze and understand the structure of the text more effectively. 

 Tokenize a given sentence(my_text). Store the tokenized words in the variable tokenized_words.

In [3]:
import nltk
nltk.download('punkt')

tokenized_words = word_tokenize(my_text)

tokenized_words

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\nandi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


['WASHINGTON',
 '--',
 'In',
 'the',
 'wake',
 'of',
 'a',
 'string',
 'of',
 'abuses',
 'by',
 'New',
 'York',
 'police',
 'officers',
 'in',
 'the',
 '1990s',
 ',',
 'Loretta',
 'E.',
 'Lynch',
 ',',
 'the',
 'top',
 'federal',
 'prosecutor',
 'in',
 'Brooklyn',
 ',',
 'spoke',
 'forcefully',
 'about',
 'the',
 'pain',
 'of',
 'a',
 'broken',
 'trust',
 'that',
 'African-Americans',
 'felt',
 'and',
 'said',
 'the',
 'responsibility',
 'for',
 'repairing',
 'generations',
 'of',
 'miscommunication',
 'and',
 'mistrust',
 'fell',
 'to',
 'law',
 'enforcement',
 '.',
 'Barack',
 'Obama',
 'is',
 'the',
 'husband',
 'of',
 'Michelle',
 'Obama',
 '.']