In [1]:
text = '''
Thousands of firefighters labored to contain the four wildfires raging across the Los Angeles area Saturday before evening winds forecasted to fan additional flames.
In the Pacific Palisades, incarcerated firefighters dug wide trenches across the charred landscape in an attempt to contain the blaze, which has been called the most destructive in the city’s history. Across the city, in Altadena, first responders dragged hoses over burned-out cars and rebar. In Mandeville Canyon, where the Palisades fire grew closer to the UCLA campus – prompting evacuation orders across the Brentwood and Encino neighborhoods – firefighting planes dropped water and retardant in an aggressive aerial attempt to halt the fire’s path.
As containment levels on the two largest fires grew, city and county officials began the difficult work of identifying victims. At least 16 fatalities were confirmed in the Palisades and Eaton fires Saturday, according to the Los Angeles county medical examiner. Relatives have begun coming forward to identify the victims, which include several older Black residents of Altadena who refused to leave their longtime homes, and multiple people with disabilities or receiving home healthcare, who could not be moved, including an Australian former child actor.
The Los Angeles county sheriff, Robert Luna, said the death toll is expected to rise as authorities deploy search dogs to devastated areas. The sheriff also said 13 people are reported missing.
The fires, which have consumed an area about two and a half times the size of Manhattan, have displaced 200,000 people and destroyed more than 12,000 homes and structures, including entire residential neighborhoods. They have also prompted a political brawl – in both Los Angeles and nationally.
Sunset view of multistory structure burned out except intact spiral staircase
View image in fullscreen
The remains of a building in Pacific Palisades on 9 January 2025. Photograph: Jason Ryan/NurPhoto/Rex/Shutterstock
On Friday, California's governor, Gavin Newsom, ordered an inquiry into LA county’s water management after reports emerged that a critical reservoir was offline when the fires started, leaving some emergency hydrants with low water pressure before running dry. The LA fire chief, Kristin Crowley, has been vocal about how the water supply issues – and budget cuts – “failed” her firefighters
A spokesperson for the water and power department confirmed the Santa Ynez reservoir, which helps supply water in the Pacific Palisades, had been offline for scheduled maintenance when the fire ignited.
On Saturday, the LA department of public works issued its own statement “correcting misinformation” about the water system.
“Water pressure in the system was lost due to unprecedented and extreme water demand to fight the wildfire without aerial support,” it said. The department “was required to take the Santa Ynez Reservoir out of service to meet safe drinking water regulations,” it added.
The water supply debacle has prompted national debate, with Donald Trump chiming in.
At the same time, nearby blue and red states as well as foreign countries are making their own political statements in their decisions to deploy firefighters to aid California. On Saturday, the Republican Texas governor, Greg Abbott, announced that his state would deploy first responders to left-leaning California – just a day after Mexico and Canada announced both countries would send firefighters to aid the United States even as Trump has threatened to levy tariffs against both.
'''

In [2]:
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

In [3]:
sentences = nltk.sent_tokenize(text)
sentences

['\nThousands of firefighters labored to contain the four wildfires raging across the Los Angeles area Saturday before evening winds forecasted to fan additional flames.',
 'In the Pacific Palisades, incarcerated firefighters dug wide trenches across the charred landscape in an attempt to contain the blaze, which has been called the most destructive in the city’s history.',
 'Across the city, in Altadena, first responders dragged hoses over burned-out cars and rebar.',
 'In Mandeville Canyon, where the Palisades fire grew closer to the UCLA campus – prompting evacuation orders across the Brentwood and Encino neighborhoods – firefighting planes dropped water and retardant in an aggressive aerial attempt to halt the fire’s path.',
 'As containment levels on the two largest fires grew, city and county officials began the difficult work of identifying victims.',
 'At least 16 fatalities were confirmed in the Palisades and Eaton fires Saturday, according to the Los Angeles county medical ex

In [6]:
import re
from nltk import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
corpus = []
for i in range(len(sentences)):
    review = re.sub('[^a-zA-Z]', ' ', sentences[i])
    review = review.lower()
    review = review.split()
    review = [lemmatizer.lemmatize(word) for word in review if not word in set(stopwords.words('english'))]
    review = ' '.join(review)
    corpus.append(review)

In [7]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(binary=True)

x = cv.fit_transform(corpus)
cv.vocabulary_

{'thousand': 227,
 'firefighter': 90,
 'labored': 128,
 'contain': 42,
 'four': 98,
 'wildfire': 243,
 'raging': 179,
 'across': 2,
 'los': 139,
 'angeles': 11,
 'area': 13,
 'saturday': 204,
 'evening': 80,
 'wind': 244,
 'forecasted': 94,
 'fan': 86,
 'additional': 5,
 'flame': 93,
 'pacific': 167,
 'palisade': 168,
 'incarcerated': 117,
 'dug': 72,
 'wide': 242,
 'trench': 231,
 'charred': 33,
 'landscape': 129,
 'attempt': 14,
 'blaze': 20,
 'called': 28,
 'destructive': 60,
 'city': 37,
 'history': 109,
 'altadena': 10,
 'first': 92,
 'responder': 194,
 'dragged': 67,
 'hose': 111,
 'burned': 26,
 'car': 32,
 'rebar': 180,
 'mandeville': 146,
 'canyon': 31,
 'fire': 89,
 'grew': 104,
 'closer': 38,
 'ucla': 234,
 'campus': 29,
 'prompting': 177,
 'evacuation': 78,
 'order': 165,
 'brentwood': 23,
 'encino': 76,
 'neighborhood': 159,
 'firefighting': 91,
 'plane': 172,
 'dropped': 69,
 'water': 240,
 'retardant': 195,
 'aggressive': 7,
 'aerial': 6,
 'halt': 106,
 'path': 169,
 'co

In [8]:
corpus[0]

'thousand firefighter labored contain four wildfire raging across los angeles area saturday evening wind forecasted fan additional flame'

In [9]:
x[0].toarray()

array([[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
        0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 1, 0, 0, 0, 0]])