Skip to content

Company_Classification_Clustering Problem Statement : We are given with web scraped data of various businesses and companies. We need to somehow categorize these businesses and companies across a standard taxonomy (consists of term names and labels that are specific to an organization's information and unique to how that business operates). So t…

Notifications You must be signed in to change notification settings

Rishabh1928/Company_Classification

Repository files navigation

Company_Classification

We are given with web scraped data of various businesses and companies. We need to somehow categorize these businesses and companies across a standard taxonomy (consists of term names and labels that are specific to an organization's information and unique to how that business operates). So that, business can leverage this information and target potential companies.

Overview of DATASET

Website: The website of the company/business

Company Name: The company/business name

Homepage Text : Visible homepage text

H1: The heading 1 tags from the html of the home page

H2: The heading 2 tags from the html of the home page

H3: The heading 3 tags from the html of the home page

Navlink text: The visible titles of navigation links on the homepage (Ex: Home, Services, Product, About Us, Contact Us)

Meta keywords: The meta keywords in the header of the page html for SEO

Meta description: The meta description in the header of the page html for SEO

Approach to solve the problem

  1. Dealing with missing values
  2. Text preprocessing
  3. Vectorization
  4. Clustering
  5. Labelling the Clusters
  6. Check the distribution of clusters

All the necessary docs and notebook are uploaded

About

Company_Classification_Clustering Problem Statement : We are given with web scraped data of various businesses and companies. We need to somehow categorize these businesses and companies across a standard taxonomy (consists of term names and labels that are specific to an organization's information and unique to how that business operates). So t…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published