Web-crawling-and-sentiment-analysis-of-114-URLs-

Created detailed report of text readability, complexity and sentiment analysis of text data extracted from 114 websites. We used Beautiful Soup library to extract text data by web crawling from 114 URLs We cleaned and preprocessed the text data and used NLTK library in Python for the analysis.

The following analysis was done in the given assignment: 1- Sentence count per article 2- Cleanined tokenized articles without puncuations and stopwords 3- Word count and average word length per article i.e. sum of the total number of characters in each word/total number of words 4- Word count per cleaned article 5- Calculated Sylabble, complex word count and personal pronouns 6- Calculated Positivity and Negativity score 7- Created a dataframe using pandas for all the varible we calculated.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NLP_assignment.py		NLP_assignment.py
Output Data Structure.xlsx		Output Data Structure.xlsx
README.md		README.md
beautiful_soup.py		beautiful_soup.py
dataframe.xlsx		dataframe.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-crawling-and-sentiment-analysis-of-114-URLs-

About

Releases

Packages

Languages

Souravdani/Web-crawling-and-sentiment-analysis-of-114-URLs-

Folders and files

Latest commit

History

Repository files navigation

Web-crawling-and-sentiment-analysis-of-114-URLs-

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages