In [1]:
import string

from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import re
from nltk import FreqDist

### Q.1 Copy the data about the NASA from wikipedia and perform following:

In [2]:
nasa = ''' The National Aeronautics and Space Administration (NASA /ˈnæsə/) is an independent agency of the US federal government responsible for the United States' civil space program and for research in aeronautics and space exploration. Headquartered in Washington, D.C., NASA operates ten field centers across the United States and is organized into mission directorates for Science, Space Operations, Exploration Systems Development, Space Technology, Aeronautics Research, and Mission Support. Established in 1958, NASA succeeded the National Advisory Committee for Aeronautics (NACA) to give the American space development effort a distinct civilian orientation, emphasizing peaceful applications in space science. It has since led most of America's space exploration programs, including Project Mercury, Project Gemini, the 1968–1972 Apollo program missions, the Skylab space station, and the Space Shuttle. The agency maintains major ground and communications infrastructure including the Deep Space Network and the Near Space Network. NASA's science division is focused on better understanding Earth through the Earth Observing System; advancing heliophysics through the efforts of the Science Mission Directorate's Heliophysics Research Program; exploring bodies throughout the Solar System with advanced robotic spacecraft such as New Horizons and planetary rovers such as Perseverance; and researching astrophysics topics, such as the Big Bang, through the James Webb Space Telescope, the four Great Observatories (including the Hubble Space Telescope), and associated programs. The Launch Services Program oversees launch operations for its uncrewed launches. NASA supports the International Space Station (ISS) along with the Commercial Crew Program and oversees the development of the Orion spacecraft and the Space Launch System for the lunar Artemis program. It maintains programmatic partnerships with agencies such as ESA, JAXA, CSA, Roscosmos (for ISS operations), NOAA, and the USGS. NASA's missions and media operations—such as NASA TV, Astronomy Picture of the Day, and the NASA+ streaming service—have maintained high public visibility and contributed to spaceflight outreach in the United States and abroad. A subject of numerous major films, NASA has maintained an influence on American popular culture since the Apollo 11 mission in 1969. For FY2022, Congress authorized a $24.041 billion budget, with a civil-service workforce of roughly 18,400; as of 2025, the acting administrator is Sean Duffy.'''

In [3]:
tokens = word_tokenize(nasa)

In [4]:
# 1. Apply POS tagging on this data

pos_tags = pos_tag(tokens)

pos_tags[0:5]

[('The', 'DT'),
 ('National', 'NNP'),
 ('Aeronautics', 'NNP'),
 ('and', 'CC'),
 ('Space', 'NNP')]

In [5]:
# 2. Remove punctuation symbols

words = [token for token in tokens if token not in string.punctuation]
text = ' '.join(words)

print('Text without punctuation:\n', text)

Text without punctuation:
 The National Aeronautics and Space Administration NASA /ˈnæsə/ is an independent agency of the US federal government responsible for the United States civil space program and for research in aeronautics and space exploration Headquartered in Washington D.C. NASA operates ten field centers across the United States and is organized into mission directorates for Science Space Operations Exploration Systems Development Space Technology Aeronautics Research and Mission Support Established in 1958 NASA succeeded the National Advisory Committee for Aeronautics NACA to give the American space development effort a distinct civilian orientation emphasizing peaceful applications in space science It has since led most of America 's space exploration programs including Project Mercury Project Gemini the 1968–1972 Apollo program missions the Skylab space station and the Space Shuttle The agency maintains major ground and communications infrastructure including the Deep Spa

In [6]:
# 3. Delete the stopwords

tokens = word_tokenize(text)
stop_words = stopwords.words('english')

keywords = [token for token in tokens if token not in stop_words]

text = ' '.join(keywords)
print('Text after removing stopwords:\n', text)

Text after removing stopwords:
 The National Aeronautics Space Administration NASA /ˈnæsə/ independent agency US federal government responsible United States civil space program research aeronautics space exploration Headquartered Washington D.C. NASA operates ten field centers across United States organized mission directorates Science Space Operations Exploration Systems Development Space Technology Aeronautics Research Mission Support Established 1958 NASA succeeded National Advisory Committee Aeronautics NACA give American space development effort distinct civilian orientation emphasizing peaceful applications space science It since led America 's space exploration programs including Project Mercury Project Gemini 1968–1972 Apollo program missions Skylab space station Space Shuttle The agency maintains major ground communications infrastructure including Deep Space Network Near Space Network NASA 's science division focused better understanding Earth Earth Observing System advancin

In [7]:
# 5. Find top three words in the data

stemmer = PorterStemmer()
words = [stemmer.stem(word.lower()) for word in keywords]

freq = FreqDist(words)
print('Top three words:\n', freq.most_common(3))

Top three words:
 [('space', 16), ('nasa', 8), ('program', 8)]


### Q.2 Open the wikipedia page of Pune Find the adjectives used in the text

In [8]:
pune = '''Pune (Marathi: Puṇē, pronounced [ˈpuɳe] ⓘ POO-nay), previously spelled in English as Poona (the official name until 1978),[15][16] is a city in the state of Maharashtra in the Deccan Plateau in Western India. It is the administrative headquarters of the Pune district, and of Pune division. In terms of the total amount of land under its jurisdiction, Pune is the largest city in Maharashtra by area, with a geographical area of 516.18 km2,[17] though by population it comes in a distant second to Mumbai. According to the 2011 Census of India, Pune has 7.2 million residents in the metropolitan region, making it the seventh-most populous metropolitan area in India.[18] The city of Pune is part of Pune Metropolitan Region.[19] Pune is one of the largest IT hubs in India.[20][21] It is also one of the most important automobile and manufacturing hubs of India. Pune is often referred to as the "Oxford of the East" because of its educational institutions.[22][23][24] It has been ranked "the most liveable city in India" several times.[25][26] Pune at different points in time has been ruled by the Rashtrakuta dynasty, Ahmadnagar Sultanate, the Mughals, and the Adil Shahi dynasty. In the 18th century, the city was part of the Maratha Empire, and the seat of the Peshwas, the prime ministers of the Maratha Empire.[27] Pune was seized by the British East India Company in the Third Anglo-Maratha War; it gained municipal status in 1858, the year in which Crown rule began. Many historical landmarks like Shaniwarwada, Shinde Chhatri, and Vishrambaug Wada date to this era. Historical sites from different eras dot the city. Pune has historically been a major cultural centre, with important figures like Dnyaneshwar, Shivaji, Tukaram, Baji Rao I, Balaji Baji Rao, Madhavrao I, Nana Fadnavis, Mahadev Govind Ranade, Gopal Krishna Gokhale, Mahatma Jyotirao Phule, Savitribai Phule, Gopal Ganesh Agarkar, Tarabai Shinde, Dhondo Keshav Karve, and Pandita Ramabai doing their life's work in Pune City or in an area that falls in Pune Metropolitan Region. Pune was a major centre of resistance to British Raj, with people like Gopal Krishna Gokhale, Bal Gangadhar Tilak playing leading roles in struggle for Indian independence in their times.'''

In [18]:
tokens = word_tokenize(pune)

valid_words = [token.lower() for token in tokens if token.isalpha() and token not in stop_words]

pos_tags = pos_tag(valid_words)
adjectives = [word[0] for word in pos_tags if word[1].startswith('J')]

print('Adjective:\n', adjectives)

Adjective:
 ['english', 'maharashtra', 'deccan', 'western', 'administrative', 'total', 'largest', 'maharashtra', 'geographical', 'distant', 'second', 'metropolitan', 'populous', 'metropolitan', 'metropolitan', 'largest', 'important', 'educational', 'liveable', 'several', 'different', 'rashtrakuta', 'dynasty', 'shahi', 'peshwas', 'prime', 'british', 'india', 'third', 'municipal', 'many', 'historical', 'historical', 'different', 'major', 'cultural', 'important', 'dnyaneshwar', 'shivaji', 'gopal', 'krishna', 'gopal', 'ganesh', 'pune', 'pune', 'metropolitan', 'major', 'british', 'gopal', 'struggle', 'indian']
